Python Visualizations - Altair - 5 (Bar Chart)

One of the simplest and most common chart to visualize categorical data is Bar Chart. If the categories are on X-axis, it is usually called the column chart instead, as the height of the bars (columns) depict the numeric metric we are interested in. While both Horizontal Bar Chart and Column Chart are suitable for depicting categorical breakdown of a measure, each is best suited for a slightly different use case.

Column chart is usually preferred when the categories are ordinal in nature. For example, High/Medium/Low, Q1/Q2/Q3/Q4 etc. Because it is easier to understand the pattern when seen from left to right rather than from top to bottom. Even in the case of non-ordinal categories, when the category names are short enough, we can use column charts. However, when the category names are long, horizontal bar graphs are our friend.

With this brief prologue to the bar charts, let's jump into creating bar charts using Altair. The dataset we will be using is called - Hours Americans Need to Work to Pay Mortgage (https://data.world/makeovermonday/2018w47)

Step 1: Get and prep data

import pandas as pd
import altair as alt
import matplotlib.pyplot as mp
import numpy as np

data = pd.read_excel(r'Hours to Pay Mortgage.xlsx', sheet_name=r'Sheet1')

Step 2: Single Bar Chart

Let's plot the average hours per month to afford a home for each state (each row is a City, so need to aggregate the measure at State level)

alt.Chart(data).mark_bar().encode( 
    alt.X('average(Hours per Month to Afford a Home):Q', title='Avg Hours per Month to Afford a Home'),
    alt.Y('State:N', sort=alt.EncodingSortField(field='Hours per Month to Afford a Home', op="mean",order='descending')),
    tooltip = ['average(Hours per Month to Afford a Home):Q','count(City):Q'],
    color = alt.Color('count(*):Q', legend=alt.Legend(title='Number of Cities')) 
)

In the code above, we are sorting the States (on Y-axis) by the measure we are plotting, which is the average of the hours per month to afford a home. And we are coloring each state by the number of cities it has in the dataset. I've also added a tooltip to display the hours and the number of cities, though it cannot be discerned from the static picture below.


Step 2b: Top 10 Cities and Bottom 10 Cities (Just for fun!)

I would like to plot top 10 cities and bottom 10 cities by the hours for the residents need to work in order to pay their mortgage. And I want to show them side by side. This example also illustrates how the layering works in Altair.

Top 10 Cities

data_top10Cities = data.sort_values(by='Hours per Month to Afford a Home',ascending=False)[:10]

chart1 = alt.Chart(data_top10Cities).mark_bar().encode(
    alt.Y('City', sort=alt.EncodingSortField(field='Hours per Month to Afford a Home', op='sum',order='descending')),
    alt.X('Hours per Month to Afford a Home'),
    color=alt.Color('State:N', legend=alt.Legend(orient='left'))
 
)

I colored the cities by State and you can see that 6 out of 10 most expensive cities in terms of houses belong to California.



Bottom 10 Cities

data_bottom10Cities = data.sort_values(by='Hours per Month to Afford a Home',ascending=True)[:10]

chart2 = alt.Chart(data_bottom10Cities).mark_bar(color='lightblue').encode(
    alt.Y('City', sort=alt.EncodingSortField(field='Hours per Month to Afford a Home', op='sum',order='descending')),
    alt.X('Hours per Month to Afford a Home'),
)

The above code results in a simple bar chart with a single color for all the cities and the states are not marked in any way In this example, I would like to display the states by text rather than color. So, I add a text layer. Please note that overlaying different chart types can be achieved by using the + operator.

text2 = alt.Chart(data_bottom10Cities).mark_text(baseline='middle').encode(
    alt.Y('City'),
    alt.X('Hours per Month to Afford a Home'),
    text=alt.Text('State:N'),


chart2 + text2



And when I like to see both top 10 cities and bottom 10 cities side by side (for some reason), I can do that by concatenating both the charts using the pipe | operator (the second chart here is actually a combination of both the bar chart and text)

chart1 | chart2 + text 2




Note: The legend position was changed to 'bottom-left' in chart  to achieve the above representation. Also, the text labels in chart2 are aligned left for a better look and feel.

Comments