Pokeit Letter #2 – Nate Silver’s Crystal Ball
At 9:46 p.m., blogging on his site FiveThirtyEight.com, Nate Silver called the presidential election for Barack Obama. The television networks followed suit about an hour and 15 minutes later after most polls in Western states closed.
Of course, Mr. Silver had a head start: he had forecast that Senator Obama would beat Senator John McCain back in March.
From the New York Times – November 10, 2008
Nate Silver, the prodigy behind the PECOTA system for predicting the performance of baseball players, and former economic consultant for KPMG, had developed a new statistical framework for analyzing elections. Silver had already proven its scary accuracy during the Democratic primaries in May. While every other commentator was celebrating Hillary Clinton’s resurgent momentum, Silver was skeptical of the new polls showing she would win by five in Indiana and had closed the gap to 8 in North Carolina. The fresh polls didn’t make sense when compared against the relatively stable demographic data. Blogging under than handle Poblano, he broke down the numbers in a different way – Clinton by just two in Indiana, and a seventeen point whuppin in the Tar Heel State. On May 6th, the night of the Democratic primaries, Clinton won Indiana by one and lost North Carolina by fifteen.
By the end of election night on November 4th, Nate’s model had predicted the popular vote within one percentage point, correctly predicted the results of 49 of the 50 States, and accurately forecasted all of the resolved Senate races.
My good friend John from Uganda said it best:
“That was so exciting moment for the young man to beat a big man pants down”
Indeed it was. But how exactly did he do it? The most common understanding goes like this: math genius creates a complicated statistical black box able to conjure up crystal ball-like predictions from the polls. World finds out, world goes apeshit, and its not long before Silver’s being chased around by a Hasidic Jewish sect, and agents of a nefarious Wall Street firm[1]:
11:15, restate my assumptions
Figure 1: To arrive at a win percentage on election day, Silver combined the reliability-weighted average of all the polls, adjusted for more recent polls from other states, averaged in predictions from demographic and economic indicators, created a projection to account for the historical trends of undecided voters and ran a Monte Carlo simulation to translate polling predictions and polling error into a probabilistic statement of the likely outcome (i.e. Obama has a 49% chance of winning Indiana).[2]
Sensationalism aside, the actual story of how Nate Silver derived his electoral predictions is as fascinating and impressive as they come.
The starting point for Nate was a growing dissatisfaction with the way commentators were botching the analysis of the race’s primary data source – the polls. Poll aggregators like RealClearPolitics.com gave every poll the same weight. Good polls were mixed with polls having small sample sizes, polls from unreliable pollsters, old polls and polls from pollsters with a known political bias. Silver wanted to create an aggregate of the polls, but with a weight towards the best ones.
In order to determine the best polls, he examined all of the old polls, took the average ‘miss’ for each pollster across each contest they polled, and compare it to the average miss of other pollsters in the same contest. The methodology for calculating the rankings and subsequent weights based on effective sample size are exhaustively documented @fivethirtyeight.com.
Individual outlier polls, and states with little or no recent polling, posed another problem. For example, in February of 2008, the single Kentucky poll showed Obama trailing McCain by 29 points, whereas the only Tennessee poll had Obama trailing by 9 points. Since Tennessee and Kentucky are fairly similar, it was unlikely that there was in fact a 20-point gap between the two states.[3] This realization lead to another insight. The fairly stable relationship between the demographics of Tennessee and Kentucky and their expected electoral outcomes is generalizable to elections in every other state. If you were to look back through the data of election outcomes, you would find a statistically significant relationship between a states demographics and a candidate’s, or party’s, expected two-way share of the vote (two-way as in excluding 3rd party candidates).
One of the more notable cases where the polls began to diverge from the historical relationship was in North Carolina during the weeks leading up to the Democratic Primary. The pollster Insider Advantage, released a poll 6 days before the election showing that Hillary Clinton had pulled ahead by 2 points. Other polls also pointed to a narrowing of the gap. Silver smelled a lark. He suspected that pollsters were significantly underestimating Obama’s margin of victory in Southern states with substantial black populations. In addition, early voting data in North Carolina suggested that pollsters may also be significantly underestimating the proportion of African-Americans in the voting population.[4] While the polls showed tightening, Silver’s model correctly forecasted a double-digit victory for Barack Obama.
The assumption was that voters in North Carolina would behave like demographically-aligned voters in other states. More explicitly, the model looked at a set of independent demographic variables and then tried to estimate their effect on one dependent variable: Obama’s two-way vote share. The statistical technique used to parse out this relationship is known as multiple regression analysis. Let’s jump right in.
Regression
Many empirical questions can be posed as “what is the effect of X on Y?”. In our example, we’re looking at the effect of a set of state demographic statistics, X on Obama’s share of the vote total, Y. The mathematical relationship of this statement is [5]
![]()
Where a is the intercept or constant, and b is a measures of how much Y changes (on average) when X changes. In a simple world (think Pre-Algebra), with no other cofounding factors to consider, b would simply be the slope b=y/x. In the real world, things are far more complicated. Obama’s two-way vote share is determined by a lot things we aren’t able to include in the model. For example, the Reverend Wright scandal was going on at the time and the results of which were rather unpredictable – certainly not by an econometric model using only recent polling history. Econometricians call these other factors “error” or “unobservables” and they use the letter e to represent it.
Since we can’t just find the slope between Y/X, how do we guess the value of b? Well, we want our estimate of b to be the best estimate of b possible, i.e. we want to maximize the accuracy of our estimate of b. In this case, maximizing the accuracy of b is also the same thing as minimizing the size of the error term, or what we can’t explain. And to be more precise, we don’t want to minimize e in the sense of making it as negative as possible, we want to normalize e in a way to make sure we’re measuring its absolute size. We could do this by minimizing the absolute value of e – but that’s quite difficult. A far simpler exercise would be to take the square of e and then try to minimize the value of that vector. I’m going to walk through the derivation of b using this method of minimizing the size of square of our e vector. This technique is call “(ordinary) least-squares regression”. Replacing ‘other factors’ with e, the true relationship between Y and X is
We will also introduce an equation for the average value of Y
![]()
Let’s perform a mathematical parlor trick and subtract the average value of Y from both sides of the true relationship equation
![]()
The a constants cancel out and this can be rewritten as
![]()
Rearranging the terms so we isolate e
![]()
Recall that our best guess of b is the value that minimizes the sum of the squared errors. Therefore, we want to pick b to minimize
![]()
Think back to calculus now and recall how to minimize this function: take a derivative with respect to b, and set it equal to zero.
![]()
A little rearranging
![]()
Then if we divide through to isolate b, and we have

This is our formula for the best guess of b. For those familiar with statistics, this can be viewed as the covariance between X & Y divided by the variance in X
![]()
To be sure, the purpose of this exercise was less about learning a statistical derivation, and more about examining the logic of these techniques. The ordinary least squares estimator is just one of many estimation techniques used to identify the relationship between variables, and with God’s help, causality. While each technique is different in its execution, they all attempt to either minimize the error, or maximize the likelihood that the estimated value of b* is equal to the true value of b.
The Democratic Primary Model
As Silver is surely well aware, models don’t come fully formed from the .raw file. After testing and retesting dozens of hypothetical correlations, Silver found 9 demographic variables that had a statistically significant effect on the Obama-Clinton vote share. Factors included in the model were: [6]
1. Caucus versus Primary
2. African-American population
3. Percentage of 18-29 voters
4. Percentage of adults with college degrees
5. Fundraising
6. Percentage of Southern Baptists
7. John Kerry vote share, 2004
8. Percentage of Democratic voters who self-identify as Liberal
9. Percentage of naturalized citizens, e.g. immigrants
An econometrician might write the model like this:
![]()
Where
is Obama’s vote share,
is the value of explanatory variables 1-9,
is the estimated marginal average effect of
on
, and
is the estimated constant. Taken together, these variables explained +95% of the voting breakdown in states that had already voted.
The first thing you may notice is that without even knowing the size and direction of the
coefficients for each variable, you probably have an idea about which factors favored Obama and which factors favored Hillary. As you might expect, Obama performed well with African-Americans, the youth vote, the better educated and the more liberal.
Several of the more creative variables require some explanation. Caucuses, which benefited from lots of on the ground organizing, gave Obama’s an advantage while Hillary did better in wide open primaries. Fundraising power measured the dollars raised by the candidate divided by each vote Kerry netted in 2004. This serves to quantify which candidate has the state party apparatus in their back pocket. Percentage of Southern Baptists was used as a proxy for a State’s ‘Southerness’ so not to resort to messy geographic definitions of the South. And surprisingly, Nate did not find that Obama performed worse in states with large Latino populations once all the other factors were controlled for. He did find though, that Obama did slightly worse among recent immigrants relative to Hispanics born in the United States. This is the ‘Percentage of naturalized citizens, e.g. immigrants’ variable. Silver hypothesizes that this might be on account of Bill Clinton being the President when they came to this country or became citizens. The final variable, John Kerry vote share, favored Hillary.
Conclusion
Perhaps you’re wondering why we took this long detour through political forecasting as opposed to going headlong into the poker. Without mincing words, the story of FiveThirtyEight.com is one of the most righteous examples of applied statistical modeling that I’ve ever seen. Nate’s intricate process for synthesizing win percentages from raw polling data and demographic regressions is a piece of engineering at least as complex and twice as clever as what’s done at any hedge fund out there. Each challenge had a carefully chosen solution and each solution built on the next to finally arrive at something unique and undeniably useful – a measure of a candidate’s probability of winning an election. We’re going to try to adopt a similar thought process as we begin building our expected value model for no-limit hold’em poker.
Looking back at what went into the FiveThirtyEight.com model really reaffirms my belief that econometrics is more of an art than a science.[7] The math and the stats are necessary and yes they can be complex, tedious, and dull at times. However, while you need to know the math to communicate with the data, its how well you understand the questions you’re asking that matters most. The real heavy lifting of model building is done when you are forced to order and codify concepts that you had only previously considered in rules of thumb and rough approximations.
In the next installment, we’ll be turning our attention back to poker as we examine the challenge of determining causality.
-chaz
[1] “[Silver] had been flown to New York at the invitation of a hedge fund to give a talk. They just said, ‘Why don’t you come in, talk about your models’” (New York Magazine Oct 12, 2008)
[2] Sternbergh, Adam. “The Spreadsheet Psychic: How Nate Silver Went from Forecasting Baseball Games to Forecasting Elections.” New York Magazine 12 Oct 2008: 2.
[3] Silver, Nate. “General Election Projections, Beta Version.” Daily Kos 26 Feb 2008 Web.25 Aug 2009. < http://www.dailykos.com/storyonly/2008/2/26/183555/011/136/464643>.
[4] Silver, Nate. “North Carolina Prediction: Obama by Double Digits.” FiveThirtyEight.com 5 May 2008 Web.25 Aug 2009. < http://www.fivethirtyeight.com/2008/05/north-carolina-prediction-obama-by.html>.
[5] Adapted from: Lich-Tyler, Stephen. “Supplemental Notes from Econ 570: Econometrics.” (2008)
[6] Silver, Nate. “What’s an Obama State? With February predications.” Daily Kos 9 Feb 2008 Web.22 Aug 2009. < http://www.dailykos.com/storyonly/2008/2/9/13227/22519/239/453361 >.
[7] Of course, I did not originate this belief.
