menu

Demystifying Political Polling

Getting From Polls To Predictions

2020 election map
2020 Presidential election map showing how each state voted.

It's election season! (Is it ever not?) That means it's also election prediction season, where legions of pundits, "personalities", and pollsters opine on who will win in November, with varying levels of methodological integrity. And although it has a certain cachet to say that Joe Biden has the advantage because as a Scorpio Sun in the twelfth house he vibes better with the zeitgeist, most analyst use more, ahem, complex methods. But what are those methods? How do they work? How do polls turn into predictions? We're here to answer your questions. But first, a quick refresher on U.S. elections.

All About U.S. Presidential Elections (In 5 Minutes or Less)

Here's the weird thing about U.S. presidential elections: The people don't elect the president. One would assume that candidates become president if they win the popular vote, that is if they earn a majority (or plurality) of votes. Instead, when you cast a vote for a candidate, you're actually voting for "electors," who are the people who actually cast votes for a specific candidate. In most states, the candidate who wins the popular vote in the state gets a group of electors who all cast votes for that candidate (but note that electors aren't necessarily required to vote for their state's winner—in 2016, electors cast votes for John Kasich and Colin Powell (Fortier 2020, 9)).

The Constitution empowers state legislatures to decide how to appoint these Electors and currently all states make that decision based on the popular vote, though the mechanics of it vary from state to state (Fortier 2020, 3-6). Suffice to say: Citizens cast their votes for a presidential candidate, which results in certain electors being appointed, who are supposed to follow the popular will in their state and vote for the presidential candidate who earned the most votes in their state. Then, the candidate who wins the majority of votes cast by electors wins the election.

Nationwide, there are 538 electors and each state gets a certain number of electors based on its congressional representation. This leads to a much-criticized aspect of the Electoral College: smaller states wield more electoral power. The size of the House and Senate are fixed so even though the number of Representatives a state gets adjusts based on a state's population, and thus the number of electors each state gets remains relative to its population, the number of people represented by one electoral vote is much greater in populous states.

ecollege-vs-pop
The map of the U.S. is first distorted relative to the number of electoral college votes each state gets (based on the 2020 Census), and then distorts relative to each state's population (2022 ACS 5-Year Estimates).

Confused? Imagine this: You have one pie and have to divide it among groups of people. Each group must get one slice and the size of the slice should be relative to the size of the group. The larger groups get larger slices and the smaller groups get smaller slices. That's only fair right? But you don't have infinite pie (if only!). So even though you can keep the size of slices relative to the sizes of each group, if a big groups keep getting bigger, each person in the big group is going to be getting smaller mouthfuls of pie.

Now think about this: how would candidates' campaign strategies, and their political agendas, change if they had to win the popular vote and not the Electoral College? And what is the purpose of democracy anyway? Is it supposed to enact the will of the majority? Or protect the rights of the minority? While you ponder those questions, we'll turn to the mechanics of political polling and prediction.

Polls and Predictions

People rely on opinion polling to make election predictions. These polls ask voters how they feel about certain issues and candidates, and sometimes how they intend to vote in particular elections. From those responses, pollsters try to make conclusions about election outcomes. The key challenge, of course, is that it is impossible to get every voter's opinion. If pollsters want to predict how a group of people might vote—the population of an entire county or state, for example—they must find a way to make that prediction based only on a sample of people polled. The standard approach currently is a method with the intimidating name "Multilevel Regression With Poststratification" (or, less intimidatingly, "Mister P"). Despite the technical-sounding name, the idea behind MrP can be thought of in a fairly intuitive way: If you want to know how a group will vote, first try to predict how a sample of that group would vote, then scale up that estimate based on the total population of that group. If, for example, you asked a group of Latinos in Long Beach, California, whether they prefer dogs or cats, and 60% prefer cats, you can scale up and predict that in the upcoming (extremely made-up) ballot initiative to make cats the official pet of Long Beach, 120,000 of Long Beach's 200,000 Latinos would back that initiative (we should emphasize that this is an extremely simplified, extremely fake example).

Now perhaps a few problems with this approach might occur to you. For example, one assumption is that groups of people tend to vote the same way. And, indeed, deciding how to group people is one of the key decisions when creating prediction models. Another assumption is that you actually know the composition of the total population—if there were only 10,000 Latinos in Long Beach, you would be wildly overestimating support for cats! And, of course, you're assuming that the people you did ask are representative of the people you didn't ask.

The magic of MrP is in the way it pools groups of people together. If you took a statewide poll of people's opinion of cats, the sample of Latinos in Long Beach is likely to be quite small, and thus unlikely to be representative of the opinion of Latinos in Long Beach. But you can use the results of similar groups in other areas to adjust your prediction about the opinion of Latinos in Long Beach. Further, you can do these kinds of adjustments at different levels. You can use demographic characteristics, such as race and gender, and higher-level group characteristics such as income to create different pools of people, using larger group estimates to scale and adjust estimates for smaller groups in your data. In this way, you can squeeze as much information out of your data as possible and make more accurate predictions!

About

PollRBear was created by Bella Karduck, Haley Johnson, Rohit Maramraju, and Philip Menchaca in partial fulfillment of the requirements for the degree of Master of Science in Information, Data Science, at the University of Michigan, Ann Arbor. It is intended to provide an accessible introduction to methods used for political polling and prediction. Graphics were generated with Python and R, and this website was built with the Django and Materialize CSS frameworks.

Sources

  • Bureau, US Census. 2020. “American Community Survey 5-Year Data (2009-2022).” https://www.census.gov/data/developers/data-sets/acs-5year.html.
  • Clarke, Harold, and Marianne Stewart. 2020. “COMETrends November 2020 Election Survey Data.” https://cometrends.utdallas.edu/data-and-questionnaires/.
  • Fortier, John C., ed. 2020. After the People Vote: A Guide to the Electoral College. Fourth edition. Washington, DC: American Enterprise Institute Press.
  • Hill, Andrew. n.d. “U.S. States Hexgrid.” CARTO. Accessed April 7, 2024. https://team.carto.com/u/andrew/tables/andrew.us_states_hexgrid/public/map.
  • Monmouth University Polling Institute. 2021. “Monmouth University National Poll, Number 216.” UNC Dataverse. https://doi.org/10.15139/S3/36MVPN.
  • National Archives and Records Administration. 2021. “2020 Electoral College Results.” National Archives. April 16, 2021. https://www.archives.gov/electoral-college/2020.
  • Reuters, Reuters/Ipsos Large Sample Survey 1: January 2024, Ipsos, (Cornell University, Ithaca, NY: Roper Center for Public Opinion Research, 2024), Dataset, DOI: 10.25940/ROPER-31120717.
  • Schaffner, Brian, Stephen Ansolabehere, and Sam Luks. 2022. “Cooperative Election Study Common Content, 2020.” Harvard Dataverse. https://doi.org/10.7910/DVN/E9N6PH.