Python Visualizations - Altair - 2 (Scatterplot)

Continuing from my previous post, let's plot a scatterplot using both Altair and Matplotlib.  Remember that the dataset - Hours To Pay Mortgage (https://data.world/makeovermonday/2018w47) is at the level of a city. That is, each row represents a city in United States.  For this scatterplot, I want to add a column to the data frame - 'Yearly Mortgage Payment', which is computed from the Monthly Mortgage Payment column, as I would like to compare Median Household Income, which is at the Year level.

Step 1: Read the data and transform
data = pd.read_excel(r'Hours to Pay Mortgage.xlsx', sheet_name=r'Sheet1')
data1=data.copy()
data1['Yearly Mortgage Payment']=data['Monthly Mortgage Payment']
data1['Yearly Mortgage Payment']*=12

Step 2: Simple scatterplot
alt.Chart(data1).mark_point().encode(
    x='Median Household Income',
    y='Yearly Mortgage Payment',
    size=alt.Size('Hours per Month to Afford a Home:Q'),
    tooltip = ['City:N', 'State:N','Hours per Month to Afford a Home:Q']
)



Note that Size and Tooltip encodings are very straightforward and as easy as specifying the columns that need to be used to represent them. Now, let's add some color to this plot. Say, we want to color these points by State. Not a great choice because it's highly selective, so instead of State itself I will color by the first letter of the State - bucketing the States by the first alphabet. I know this may not make analytical sense, but I think will do for the illustration purpose at hand.

Step 3: Add a State Bucket column

data1['StateBucket']=data1['State'].str[0]
Step 4: Simply add color encoding  based on StateBucket column to the Chart definition

alt.Chart(d1).mark_point().encode(
    x='Median Household Income',
    y='Yearly Mortgage Payment',
    color='StateBucket',
    size=alt.Size('Hours per Month to Afford a Home:Q'),
    tooltip = ['City:N', 'State:N','Hours per Month to Afford a Home:Q']
)



Let's try to create a similar plot using Matplotlib.

Disclaimer: I'm not very proficient with Matplotlib and I'm sure there are lot of things you can build upon the basic plots. But here, I would like to focus on what can be done with defaults and minimum enhancements.

def scatter(group):
    plt.scatter(group['Median Household Income'], group['Yearly Mortgage Payment'],
            marker='o', label=group.name, s = 2 * group['Hours per Month to Afford a Home'],)
 
 
plt.figure(figsize=(9,9))
d1.groupby('StateBucket').apply(scatter)
 
plt.legend(title='State Group', loc=2)
plt.xlabel('Median Household Income')
plt.ylabel('Yearly Mortgage Payment')


I couldn't figure out an easy way to add tooltips to this plot. Also, the options for legend positions are pretty limited out of the box (one of the four corners inside the figure). One thing to highlight here is that I needed to first groupby the dataframe in order to be able to color the points by State Bucket. Also, the gridlines and general look is way better with Altair.

If you are a keen observer, you might have noticed that there are two 'A' groups, which is puzzling. I haven't yet cracked the reason for it. I'll update the post once I figure that out. All in all, the bottom line is that it's much easier to create nice plots with Altair.

Comments