data from 538.com with altair#

download the zip

to run this code on your own laptop, start with downloading the zip file

import pandas as pd
import matplotlib.pyplot as plt

import geopandas as gpd

# activate this if running under jlab
# %matplotlib ipympl

presidential averages#

538.com - actually it is http://fivethirtyeight.com - is a site hosted by ABC news, that exposes data about the US presidential election

we’re gonna use the data in this URL:

URL = "https://projects.fivethirtyeight.com/polls/data/presidential_general_averages.csv"

update 2025

initially we would have written a function that loads this URL; it would

  • cache it on your hard drive so you can reload it faster

  • do any type conversion that you see fit

as of 2025 this URL is no longer online so we’ll just

  • load the data from data/DATA.csv

  • still be careful about the columns types

# your code

CACHE = "data/DATA.csv"

And what we want to do is to plot the average of the polls for each candidate.
In other words, you should obtain something like this - we will arbitrarily focus on the 2024 year only

../../_images/polls-over-time-20241.png
# your code

using the interactive view (after all we are using %matplotlib ipympl), zoom into the figure and retrieve

  • the date for the last data about Joe Biden

  • the date for the first data about Kamala Harris

also write a line of code to compute this second date

# your code

first_harris_date = ...

race end#

from this part we will focus on the period after first_harris_date

# your code

how many candidates are still in the data ?
make sure to keep only the 2 most famous ones

# your code

geographic rendering#

in this section we will produce a summary map, which looks like this
the color depicts the ratio between, otoh Harris’s average score over time, and otoh Trump’s
also the tooltips allow to expose more details on the individual results

../../_images/polls-over-space-20241.png

first we need a definition of the various US states; there is one here

update 2025

here again, there’s been a change since 2024: this URL no longer comes with a valid SSL certificate
so we’ll use the version stored under data/ again

# no longer easily readable by geopandas because of an SSL certificate issue
US_STATES_SHAPEFILE_URL = "https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_state_20m.zip"

US_STATES_SHAPEFILE_CACHE = "data/us-states.zip"

and we can load it like so:

# so instead of doing this
# gdf = gpd.read_file(US_STATES_SHAPEFILE_URL)

# we'll do this
gdf = gpd.read_file(US_STATES_SHAPEFILE_CACHE)

# and we get this
gdf.head()
STATEFP STATENS AFFGEOID GEOID STUSPS NAME LSAD ALAND AWATER geometry
0 24 01714934 0400000US24 24 MD Maryland 00 25151100280 6979966958 MULTIPOLYGON (((-76.04621 38.02553, -76.00734 ...
1 19 01779785 0400000US19 19 IA Iowa 00 144661267977 1084180812 POLYGON ((-96.62187 42.77926, -96.57794 42.827...
2 10 01779781 0400000US10 10 DE Delaware 00 5045925646 1399985648 POLYGON ((-75.77379 39.7222, -75.75323 39.7579...
3 39 01085497 0400000US39 39 OH Ohio 00 105828882568 10268850702 MULTIPOLYGON (((-82.86334 41.69369, -82.82572 ...
4 42 01779798 0400000US42 42 PA Pennsylvania 00 115884442321 3394589990 POLYGON ((-80.51989 40.90666, -80.51964 40.987...

so as you can see this is almost like a regular dataframe, except for the geometry column; which is a geographic entity, hence the term geo-dataframe

digression 2: using altair to produce a geographic visualization#

of course you might have to install altair… how do you go about doing that again ?

import altair as alt

# this is for rendering altair charts within the notebook
alt.renderers.enable("html")
RendererRegistry.enable('html')
# to show a geographic map from that geo-df

alt.Chart(gdf).mark_geoshape()
# or we can also use it like this if we prefer

chart = (
    alt.Chart(gdf)
    .mark_geoshape()
)

chart.display()

now in terms of presentation, it is a little suboptimal, let’s improve this a bit

(
    alt.Chart(gdf)
    .mark_geoshape()
    .properties(width=800)
    .project('albersUsa')
)

now, the initial geo-dataframe has some numeric values, that we can use to color the map !

for example, there are AWATER and ALAND - that I take it mean area of water and area of land respectively
and we can use one of these to color the different states

for that we just do, like for simpler altair plots we call encode() like so

(
    alt.Chart(gdf)
    .mark_geoshape()
    .encode(
        color="ALAND:Q",          # Q stands for quantitative
    )
    .properties(width=800)
    .project('albersUsa')
)
# or if we prefer, same result essentially
# but this way we can be more descriptive

(
    alt.Chart(gdf)
    .mark_geoshape()
    .encode(
        color=alt.Color(field="ALAND", type="quantitative", title="land area")
    )
    .properties(width=800)
    .project('albersUsa')
)
# also useful with altair, we can give a `tooltip` parameter to encode
# and this shows when your mouse hovers on a state

(
    alt.Chart(gdf)
    .mark_geoshape()
    .encode(
        color=alt.Color(field="AWATER", type="quantitative", title="water area"),
        # and we can show there anything from the table
        tooltip=["NAME", "ALAND"],
    )
    .properties(width=800)
    .project('albersUsa')
)

back to our data#

given this knowledge, you should be able to produce our target graph, namely again

../../_images/polls-over-space-20241.png
# your code

focusing on swing states#

from the graph above, keep only the following states

SWING_STATES = [
    'Nevada',
    'Arizona',
    'Wisconsin',
    'Michigan',
    'Pennsylvania',
    'Georgia',    
    'North Carolina',
]
# your code