Testing Polling Error, or: How I Learned to Stop Worrying and Love the Polls

11/8/16 Note: Hey, so when I was rerunning my model on election day, I noticed a serious error. I accidentally had Clinton and Trump’s numbers swapped in West Virginia. There is an extremely low chance of Clinton winning there. It has been corrected now. It didn’t change the prediction much, but did drop her probability from 91% to 86% in the original prediction. I reran the numbers with today’s averages and it’s up to 96%.

11/9/16 Note: Turns out the title of this post is very, very wrong and I should not have loved these polls.

Time to Make a Prediction

Here we are, just two more sleeps until the 2016 Presidential election, maybe the most terrifying election in my lifetime. How am I dealing with this anxiety? I’m becoming obsessed with polling and prediction modeling. It hasn’t really helped, but I have learned a lot about how people like Nate Silver and Nate Cohn, and people not named Nate I’m sure, make educated guesses about who’s going to win the highest office in the land.

I decided a few days ago that I wanted to try to make my own. The problem is that I decided way too late to get a good handle on the complex nature of election modeling. Maybe I’ll try to put together a more complex model for future elections, but I created one for this year that takes a simple approach to modeling possible error on polling averages. I used this model to make a state-by-state prediction on who will win this election.

The Model

Before I go any further, let me reiterate: this is a very simple model for something as complex as election prediction. To state (what should be) the obvious, polling is the best predictor for who will win a presidential election. Between 1968 and 2012, the absolute error between poll averages and actual election results ranged from 0.1% (1992) and 7.2% (1980), with a standard deviation of 1.99%. That’s pretty good, considering all of the factors that push around that single day in November. I decided to start here and model arbitrary polling error ranges.

The model (written in Python, with code below) looks at state-by-state polling averages and uses a pseudo Monte Carlo method. It first applies a random shift within a specified error range to the polling average in the state, then compares the adjusted average and determines a winner. If the candidate has a higher randomly adjusted polling average, the candidate wins the state and the electoral vote number is added to their total. If the candidate wins 270 or more, they win the election. The model writes each final electoral vote count to a text file, and the percentage of winning outcomes per state is written to a second. This allows me to make a prediction.

For my primary test, I used a an absolute polling error range of 5% and statewide polling averages from November 5th, 2016 (sorry for the 1-day delay, but I don’t think they’ve changed much). A 5% polling error is pretty high. It’s about 2.5 times the standard deviation of historical polling error. I simulated the election 10,000 times.

Fairly simple. I’m basically just converting polling averages into win probabilities. I’m hoping something this simple will show accurate results on election night.

Who Will be the 45th President of the U.S.?

According to my test, the next President of the United States will be:

Hillary Clinton – 86.2% – 293 Electoral Votes

Electoral Map

Missing from this map are the split districts: ME 1st (100%, Clinton), ME 2nd (61% Trump), NE 1st (100% Trump), NE 2nd (92.83 % Trump), and NE 3rd (100% Trump).

Electoral College Outcomes

Clinton won the election 86% of the 10,000 simulations with an average Electoral Vote count of 293. Specifically, Clinton won 8,625 times, Trump won 1227 times, and there were 149 ties.

This histogram shows the number of electoral college outcomes in the 10,000 simulations by final electoral votes by candidate. You can see here that Clinton has far more positive outcomes than Trump.

For my main simulations, the arbitrary polling error was set at 5%. I wondered what happens to her chances at victory as that absolute error grows or shrinks. I ran the script 50 times, simulating errors ranging from 1% to 50% (remember, more than 7-8% is unheard of in modern polling averages. The results can be seen in the chart below:

Chance of Clinton Victory by Simulated Polling Error

At an imagined 50% polling error (which is totally ridiculous), she still has a 61.26% shot at winning the White House. If her position we weaker, we would expect her odds to drop to 50% much faster. It seems to me that, statistically, she’s got this in the bag.

Final Thoughts

Let me stress one more time that this only simulates an arbitrary range of possible polling errors. Clinton won this because she clearly has the edge in polling. We’re seeing consistent results across the board because we’re assuming a standard polling error across all states, not each state individually. If there is a large polling error in a single state (say, Pennsylvania or New Hampshire) but not others we could see a totally different outcome.

Another thing to note is that the model is extremely close in a state like Nevada (currently 49.92% to 50.08%). This is so absurdly close (0.16%) that I imagine it could go either way. If I were to include early voting in my equation, it could easily look like Clinton is winning there and in Florida as well. Alas, my quick model is not designed to handle that. I would not be surprised if Clinton won Nevada, North Carolina, and Florida. I would be surprised if she won other swing states like Ohio and Iowa. I would be shocked if she won Arizona.

My model is also not designed to account for major news stories that drop while I’m writing this up. My model puts her at winning 293 electoral votes to Trump’s 245. I would not be surprised in the least if she outperforms that by a large margin.

My results seem to fall in line with models like PredictWise, the Princeton Election Consortium, Daily Kos, and the Huffington Post. It looks somewhat like the New York Times. My prediction do not look like FiveThirtyEight, which is far more conservative than anything else out there. Nate Silver’s model is, obviously, far more sophisticated. He still gives Clinton a 64.5% chance of winning as of this writing. This corresponds to a 38% absolute polling polling error in my model. I guess if (when) Clinton wins, we’ll all be correct. ¯\_(ツ)_/¯

I’ll write an assessment up on Tuesday night or Wednesday. I hope I can make it that long.

Update, Clarification, Reiteration – 11/7/16

Alright, some people have taken issue with the methods here, which often happens when you make any kind of prediction about a thing people care about and open your formula to the public. There are a lot of fair criticisms that can be made if this were a true prediction model. Let me reiterate that this model does not take into account trends or correlate polling error across different states or account for individual state polling errors or individual poll margins of error. I’m also not doing anything to correct for polling missteps, like underpolling certain demographics. It imagines a world where each state’s polling error is completely independent and completely random (it’s not). All this model does is take current (well, now old) polling averages, imagine an arbitrary polling error range, and apply the error range independently to each state. As I stated above, what I am essentially doing is just converting current polling averages to win probabilities by simulation. My final electoral map is just a reflection of the current polls, and I don’t think that’s a bad thing. The states each candidate won in my model are the states where they are up in the polls. It’s super simple for four reasons:

1) I decided to do this 3 days before the election and thus didn’t have time to write and test all of the other variables that might be pertinent
2) it’s super simple by design because I don’t want to be guilty of making biased assumptions that I have no business making
3) I’m really just interested in how polling error might affect the deviation between Clinton and Trump’s polling averages, and how that might affect the election
4) I’m just learning here–I’m not an authority, this is not authoritative, and I’m doing this for fun.

I think it’s very likely that Clinton will win Nevada. It looks pretty likely that she’ll win North Carolina as well. Florida may be a true toss-up. I do not believe she will lose any of the states that my model currently shows her winning. But those are just my feelings, and I don’t currently have the data and analysis to back that up.

Election Day update, for due diligence – 11/8/16

I’ve updated the final poll averages for this Election Day and have a new number. Hillary Clinton’s numbers have gone up over the past couple days. This model now puts her final probability at 96% with 306 Electoral Votes. She won 9601, lost 356, and tied 44 simulations.

In this map, Clinton’s probability of winning some key swing states increased to put her over the top. Florida increased to 56.9%, North Carolina to 57.4%, New Hampshire to 86.1%, and Nevada to 63.1%. Remember, if she maintains the “blue wall” in this election, she only needs to win one of those states to get over 270. Things are looking pretty good for her right now. The first big round of poll closings happens in a few hours.

Sorry that the percents aren’t formatted correctly. I’m doing this very quickly.

Update – 11/9/16

12:29 AM – I don’t even know what to say. I don’t think anyone knows what to say.
8:15 AM – I’m looking into what might have gone wrong with the poll averages. It looks right now like the final polling error is going to fall somewhere around 4%. I suspect that the polling error by state is going be huge in some places. Clinton managed to over-perform in a lot of places, but none of them were swing states. Trump pretty much beat expectations everywhere. There’s going to be a lot of soul-searching and hand wringing among pollsters and statisticians, and people like me who ran with the numbers and made a very, very wrong prediction. It looks like Trump is going to win the electoral vote, but arguably not by much of a mandate. Interestingly, Clinton is likely to walk away with the popular vote–only the fifth time in history a candidate has lost the electoral vote but won the popular vote (the last one being the infamous Bush/Gore election of 2000).

The Code

Leave a Reply

Be the First to Comment!

Notify of