Python Visualizations - Altair

Altair is one of the recent packages built on top of Vega Lite for creating interactive visualizations in Python. I had so much fun playing with its features and in this series of posts, I would like to share my experiments with it here.

First of all, install it as shown below:

Anaconda:


conda install altair --channel conda-forge

To use Jupyter notebook renderer, you must install the vega package and the associated Jupyter extension:
conda install -c conda-forge vega_datasets notebook vega
Pip:


pip install -U altair

pip install -U vega_datasets notebook vega

As I go along, wherever applicable I try to compare Altair visuals and code with those of Matplotlib, to highlight the features and ease-of-use of Altair package.

Step 1: Import required packages


import pandas as pd

import altair as alt

import matplotlib.pyplot as plt

import numpy as np

%matplotlib inline

alt.renderers.enable('notebook')

Step 2: Read the data.
Dataset name: Hours to Pay Mortgage (Source: https://data.world/makeovermonday/2018w47)


data = pd.read_excel(r'Hours to Pay Mortgage.xlsx', sheet_name=r'Sheet1')

Here is the description of the dataset:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 97 entries, 0 to 96
Data columns (total 9 columns):
City                                97 non-null object
State                               97 non-null object
Median Home Listing Price           97 non-null int64
30-year Fixed Mortgage Rate         97 non-null float64
Monthly Mortgage Payment            97 non-null int64
Median Household Income             97 non-null int64
Hours per Month to Afford a Home    97 non-null float64
Number of Periods                   97 non-null int64
Present Value                       97 non-null int64
dtypes: float64(2), int64(5), object(2)
memory usage: 6.9+ KB
None

Step 3: Simple Histogram


alt.Chart(data).mark_bar(color='gold').encode(

    alt.X('Hours per Month to Afford a Home', bin=True, axis=alt.Axis(title='Hours per Month to Afford a Home (in bins)')),

    alt.Y('count(*):Q', axis=alt.Axis(title='Number of Cities')),

)

The syntax is very simple. You start with calling Chart function on the data frame and add encodings thereafter for each property. For the histogram, we can encode the color for the entire visual upfront, since it isn't dependent on any data and is chosen arbitrarily.

For X axis function, by specifying the bin property as True, we defined the bar chart type as histogram. We can specify the column for X-axis and also the title all together.

The Y axis definition is more interesting. We just want to plot count of rows for each bin and hence the aggregation specified is 'count' all and then to make it more explicit, we specify Q to indicate that it's a quantitative variable. (The other options include N for nominal and O for Ordinal.) The ability to provide aggregation is a powerful feature, which implies that we can plot data at any aggregated level without having to transform the data frame first.

You can see that the horizontal grid lines appear by default and the bars are separated slightly so that the visual is more appealing and readable.

Now, compare this with the default histogram that can be created with Matplotlib:


plt.hist(data['Hours per Month to Afford a Home'], color='lightpink')

plt.xlabel('Hours per Month to Afford a Home (in bins)')

plt.ylabel('Count')

The Matplotlib code for basic histogram is comparably simple, but see the difference in rendition. A stark difference indeed.

My Data Odyssey

Labels

Search This Blog

Python Visualizations - Altair - 1 (Histogram)

Comments

Post a Comment