Altair is one of the recent packages built on top of Vega Lite for creating interactive visualizations in Python. I had so much fun playing with its features and in this series of posts, I would like to share my experiments with it here.
First of all, install it as shown below:
Anaconda:
To use Jupyter notebook renderer, you must install the vega package and the associated Jupyter extension:
Pip:
As I go along, wherever applicable I try to compare Altair visuals and code with those of Matplotlib, to highlight the features and ease-of-use of Altair package.
Step 1: Import required packages
Step 2: Read the data.
Dataset name: Hours to Pay Mortgage (Source: https://data.world/makeovermonday/2018w47)
Here is the description of the dataset:
Step 3: Simple Histogram
The syntax is very simple. You start with calling Chart function on the data frame and add encodings thereafter for each property. For the histogram, we can encode the color for the entire visual upfront, since it isn't dependent on any data and is chosen arbitrarily.
For X axis function, by specifying the bin property as True, we defined the bar chart type as histogram. We can specify the column for X-axis and also the title all together.
The Y axis definition is more interesting. We just want to plot count of rows for each bin and hence the aggregation specified is 'count' all and then to make it more explicit, we specify Q to indicate that it's a quantitative variable. (The other options include N for nominal and O for Ordinal.) The ability to provide aggregation is a powerful feature, which implies that we can plot data at any aggregated level without having to transform the data frame first.
You can see that the horizontal grid lines appear by default and the bars are separated slightly so that the visual is more appealing and readable.
Now, compare this with the default histogram that can be created with Matplotlib:
The Matplotlib code for basic histogram is comparably simple, but see the difference in rendition. A stark difference indeed.
First of all, install it as shown below:
Anaconda:
conda install altair --channel conda-forge
conda install -c conda-forge vega_datasets notebook vega
Pip:
pip install -U altair
pip install -U vega_datasets notebook vega
As I go along, wherever applicable I try to compare Altair visuals and code with those of Matplotlib, to highlight the features and ease-of-use of Altair package.
Step 1: Import required packages
import pandas as pd
import altair as alt
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
alt.renderers.enable('notebook')
Step 2: Read the data.
Dataset name: Hours to Pay Mortgage (Source: https://data.world/makeovermonday/2018w47)
data = pd.read_excel(r'Hours to Pay Mortgage.xlsx', sheet_name=r'Sheet1')
Here is the description of the dataset:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 97 entries, 0 to 96 Data columns (total 9 columns): City 97 non-null object State 97 non-null object Median Home Listing Price 97 non-null int64 30-year Fixed Mortgage Rate 97 non-null float64 Monthly Mortgage Payment 97 non-null int64 Median Household Income 97 non-null int64 Hours per Month to Afford a Home 97 non-null float64 Number of Periods 97 non-null int64 Present Value 97 non-null int64 dtypes: float64(2), int64(5), object(2) memory usage: 6.9+ KB None
Step 3: Simple Histogram
alt.Chart(data).mark_bar(color='gold').encode(
alt.X('Hours per Month to Afford a Home', bin=True, axis=alt.Axis(title='Hours per Month to Afford a Home (in bins)')),
alt.Y('count(*):Q', axis=alt.Axis(title='Number of Cities')),
)
The syntax is very simple. You start with calling Chart function on the data frame and add encodings thereafter for each property. For the histogram, we can encode the color for the entire visual upfront, since it isn't dependent on any data and is chosen arbitrarily.
For X axis function, by specifying the bin property as True, we defined the bar chart type as histogram. We can specify the column for X-axis and also the title all together.
The Y axis definition is more interesting. We just want to plot count of rows for each bin and hence the aggregation specified is 'count' all and then to make it more explicit, we specify Q to indicate that it's a quantitative variable. (The other options include N for nominal and O for Ordinal.) The ability to provide aggregation is a powerful feature, which implies that we can plot data at any aggregated level without having to transform the data frame first.
You can see that the horizontal grid lines appear by default and the bars are separated slightly so that the visual is more appealing and readable.
Now, compare this with the default histogram that can be created with Matplotlib:
plt.hist(data['Hours per Month to Afford a Home'], color='lightpink')
plt.xlabel('Hours per Month to Afford a Home (in bins)')
plt.ylabel('Count')
The Matplotlib code for basic histogram is comparably simple, but see the difference in rendition. A stark difference indeed.
Comments
Post a Comment