Python Visualizations - Altair

In this post, I would like to demonstrate creating Choropleth Map (Filled Map) visual. This one needs a lot of data preparation.

First, we need to get the geocode data. The geo-code data uses codes to represent geographies. So, we need to get State Codes into our dataset.

Step 1: Read and prep the dataset

Dataset source: Hours to Pay Mortgage


hours = pd.read_excel(r'Hours to Pay Mortgage.xlsx', sheet_name=r'Sheet1')


#Get State Numeric Code from Reference dataset

dfs=pd.read_html('https://www.census.gov/geo/reference/ansi_statetables.html', header=0)[0]

A little clean-up of the state codes - to remove nulls and to set the appropriate data format

dfs0 = dfs[np.isfinite(dfs['FIPS State Numeric Code'])] #Remove Nulls

dfs1=dfs0.copy()

dfs1['FIPS State Numeric Code']= dfs0['FIPS State Numeric Code'].astype('str') #Convert Numeric Code to String

dfs1=dfs1.replace({'\.0':''}, regex=True) # Remove decimal point and the Zero

Now, let's merge the state codes into our original dataset.


#Trim the State Name in both the Reference and Main datasets and then merge on Name to get the Numeric Code into the Main dataset

dfs1['Name']=dfs['Name'].str.strip()

hours['State']=hours['State'].str.strip()

data11=pd.merge(hours,dfs1, left_on=['State'], right_on=['Name'], how='left') #

We build the choropleth map, by plotting the states first and then color it using the data from our dataset. This requires us to aggregate our dataset, which is at City level, to the State level.

data12 = data11.groupby(['State','FIPS State Numeric Code'])['Hours per Month to Afford a Home'].mean().reset_index(name='Avg Hours per Month to Afford a Home')

Step 2: Plotting the map

Let's get States related geodata and then lookup our main dataframe on State Numeric Codes to get Avg Hours and State name, which we need for the visualization.


import altair as alt

from vega_datasets import data

We use the data function to get the geocode data.


states = alt.topo_feature(data.us_10m.url,'states')

alt.Chart(states).mark_geoshape().encode(

    color='Avg Hours per Month to Afford a Home:Q', tooltip = ['Avg Hours per Month to Afford a Home:Q' ,'State:N']

).transform_lookup(

    lookup='id',

    from_=alt.LookupData(data12, 'FIPS State Numeric Code', ['Avg Hours per Month to Afford a Home', 'State'])

).project(

    type='albersUsa'

).properties(

    width=500,

    height=400

)

After the initial encodings of the chart, we chain other functions. This way, we can add transformations, set the properties etc.

Some states aren't plotted as there is no data for them in the dataset.

Step 3: Double Choropleth

Let's plot two maps side by side.

Creating two datasets, with the Hours per Month to Afford a Home aggregated in two different ways - Max and Min, representing the metric for the City with the highest number and City with the lowest number per state respectively.


data13 = data11.groupby(['State','FIPS State Numeric Code'])['Hours per Month to Afford a Home'].max().reset_index(name='Max Hours per Month to Afford a Home')

data14 = data11.groupby(['State','FIPS State Numeric Code'])['Hours per Month to Afford a Home'].min().reset_index(name='Min Hours per Month to Afford a Home')

The approach is to build each map separately and then combine them later.


import altair as alt

from vega_datasets import data



states = alt.topo_feature(data.us_10m.url,'states')

Chart 1 - Representing the distribution of the largest values per state


chartMax= alt.Chart(states).mark_geoshape().encode(

    #color='Max Hours per Month to Afford a Home:Q',

    color=alt.Color('Max Hours per Month to Afford a Home:Q', legend=alt.Legend(orient='left', title='Max Hours')),

    tooltip = ['Max Hours per Month to Afford a Home:Q' ,'State:N']

).transform_lookup(

    lookup='id',

    from_=alt.LookupData(data13, 'FIPS State Numeric Code', ['Max Hours per Month to Afford a Home', 'State'])

).project(

    type='albersUsa'

).properties(

    width=500,

    height=400

)

Chart 2 - Representing the distribution of the smallest values per state


chartMin= alt.Chart(states).mark_geoshape().encode(

    #color='Max Hours per Month to Afford a Home:Q',

    color=alt.Color('Min Hours per Month to Afford a Home:Q', legend=alt.Legend(orient='left', title='Min Hours')),

    tooltip = ['Min Hours per Month to Afford a Home:Q' ,'State:N']

).transform_lookup(

    lookup='id',

    from_=alt.LookupData(data14, 'FIPS State Numeric Code', ['Min Hours per Month to Afford a Home', 'State'])

).project(

    type='albersUsa'

).properties(

    width=500,

    height=400

)

Combining them together side by side and most importantly we want the color scales to be independent. Otherwise, the resulting visual wouldn't be that useful.


alt.hconcat(chartMin, chartMax).resolve_legend(

    color="independent",

    size="independent"

).resolve_scale(color="independent")

It is interesting to note that while home-owners in Massachusetts spend very high number of hours per month to pay their mortgage, the lowest number of hours is also much higher compared to all other states. So, the spread isn't much in the case of Massachusetts. In contrast, while people in California spend the highest number of hours per month to pay their mortgage, the lowest number of hours in that state is well below the median. Here, we observe a considerable spread. The reason why would be clear in future posts, when we explore this dataset further.

My Data Odyssey

Labels

Search This Blog

Python Visualizations - Altair - 3 (Choropleth)

Comments

Post a Comment