Pokeit Letter #3 – Cum hoc ergo propter hoc!
There’s this great story one of my economics professors use to tell in order to illustrate that “correlation does not imply causation”. It’s most certainly false but it goes like this. The setting is 16th century Russia, during the latter half of Ivan the Terrible’s reign. In general, this was not a great time to be living in Russia. A combination of drought, famine, Polish-Lithuanian raids, Tatar invasions, and the sea-trading blockade carried out by the Swedes, Poles and the Hanseatic League had devastated the country. On top of that, a particularly nasty epidemic of the plague was killing between 600 and 1000 people every day. It was not known at the time how the plague spread, so efforts to fight it were subject to wild theories. Ivan, mentally unstable and physically disabled, suspected treachery. To prove it, he had his advisors gather statistics on the number of doctors and the number of dead throughout his Kingdom. Once it was discovered that regions with more doctors also had more deaths, Ivan rounded up all of the doctors and had them executed for treason.
There’s also this great chart I found on Wikipedia showing the relationship between U.S. highway fatalities and fresh lemon imports:
(Just an aside, I work for a firm that does a lot of professional services contracting with the federal government. The most common result of these contracts are PowerPoint presentations with dozens of pretty charts and tables. There’s a running joke that many of the ridiculous PowerPoints that come out of the Federal government are actually produced by some entry level grunt working for our firm. Basically, what I’m trying to say is that the odds are 50:50 that someone down the hall from me is convinced that we need to increase fresh lemon imports. Better make it from Mexico just to be sure.)
The phrase that comes to mind when people talk about murderous Russian Doctors and the unique benefits of fresh lemon imports is correlation does not imply causation. This fallacy, which is also known as cum hoc ergo propter hoc (Latin for “with this, therefore because of this”) is actually a combination of several different limitations. Chief among them are the problems of reverse causality, unexplained heterogeneity, and selection.
Reverse Causality[1]
Interpreting relationships in the social sciences is hard (the math in the last letter, that’s the easy part). The problem boils down to the fact that statisticians and economists commonly lack the laboratory ideal of a randomized trial. That is, we aren’t always able to introduce an ‘X’ variable randomly into a population and measure its average effect on an outcome ‘Y’ (although a group of clever development economists are doing just that). For instance, it would be infeasible and immoral to mandate that a ‘treatment’ group of children must go to school, while preventing a ‘control’ group from enrolling in order to measure the effect of schooling on future wages. Likewise, you can’t set up a ‘Cold War’ experiment where a random selection of nations enacts communist policies while another cohort pursues free-market capitalism just to find out if Ayn Rand was on to something.
In the real world, we do not get to control the X variables (schooling, systems of government, lemon imports). Often they are determined by other things – possibly including Y itself. Regression analysis captures the correlation between X and Y, but nothing else. Let’s go back to imperialist Russia and try to find the effect of Doctors on Plague. We run the regression:
![]()
And we find that b is large and positive – lets say that each additional doctors is associated with 10 infections in an arbitrary population. Can we take this as foolproof evidence that doctors were spreading the plague, if not from the fleas nesting in their fur, than by intentional infection? Well, not necessarily. The more sensible explanation is that more doctors were attracted to areas where there were higher incidents of plague. While this definitely makes more sense, it is important to remember that it is still an interpretation and not a product of the data itself. All the data can tell you is that there is a correlation.
A more challenging interpretation can arise from trying to estimate the effect of the psychological condition known as ‘tilt’ on win/loss rates in poker. Tilt is poker term for a state of mental frustration caused by bad beats, challenging interpersonal situations, and/or a losing session. Tilting may cause a player to play overly aggressive or loose, and it often has a negative effect on profitability. Let’s say we want to measure the effect of being on tilt on average profitability. Setting aside for now the exact specification of these two variables, we set up the equation:
![]()
After running a regression we find a strong and negative correlation between being on tilt and profitability. This result is somewhat ambiguous though. Is it that being on tilt causes players to lose money, or does losing money on cause players to go on tilt? As any poker player can tell you, it’s very likely that the causation goes both ways.
Unobserved Heterogeneity
A second type of complication is unobserved heterogeneity. This is a problem when people with different values of X are different in other ways that also affect Y. If some unobserved factor is correlated with both X and Y, our estimates of b will be biased.
The most common example of this is the “ability bias” in estimating the returns to schooling. Suppose a person’s wages are a function of their education level and their ‘ability’. Here ability can mean intelligence, job skills, savvy, taste for office politics, whatever.
![]()
Even if you devised a set of really neat statistics to measure Ability, they’d almost surely be imperfect in some way, and you often don’t have access to all the necessary data. Suffice it to say, the Ability variable is omitted and absorbed into the error term. The problem now is that Ability affects earnings, but it also affects how much education you get. If we are unable to measure Ability, then we will mistakenly attribute its effect to education.
Player names in online poker provide a more pure example of unobserved heterogeneity. Before you can give your credit card number to an offshore, semi-legal poker website, you have to create a player name. My friend Joseph Crowley has observed that players with “Mike” or “Mikey” in their name tend to be huge donkeys at the table. Let’s test his hypothesis, that is, whether or not having some variant of Mike in your name is associated with lower win rates in dollars per hand:

Rather than just hypothesizing the relationship between these two variables, let’s test it empirically using data collected from the hand histories of real money poker players. I have a database on my work-issued Lenovo Thinkpad with approximately 800,000 games of $0.50 – $1 NL Hold’em providing statistics on 22,420 players. Using this data, I identify if a player has “Mike” in their name with the dummy variable MIKEY and the win rates of every player in number of dollars won per hand with the variable USD_hand. Running the regression of USD_hand on MIKEY in the statistical package Stata produces the following regression table:

Joe’s intuition seems to have been spot on as the b coefficient for the variable MIKEY is large, negative, and significant at the 95% confidence level. Having “Mike” in your player name is associated with win rates that are $1.25 a hand below average (as shown in the Coef. column). Also, the P>|t| value highlighted in yellow is less than 0.05 – hence why we say it is significant at the 95% confidence level. What this means is that if the null hypothesis that there is no relationship between having ‘Mike’ in your name and your win rates is true, we would observe this $1.25 below average result less than 5% of the time. Now here’s where the whole thing gets Freakonomics on us. Does this result mean that if I changed my username to ‘MikeOrangelloNutz’, my average win rate per hand would drop by $1.25? Likely not.
Unless your player name makes people think you are a pro (like if it was Phil Ivey or something) it should have no direct effect on win rates. However the data shows a positive correlation between the two. This isn’t a case of reverse causality, either (win rates don’t cause players to change their player name). What’s happening here is that both win rate and player name are cause by the same unobserved factor - player skill. And for whatever reason, Mikey tends to suck at poker.
Selection and self-selection
When we estimate b we want to interpret it as “the average effect of X on Y.” However, people are different, and each person has a different internal relationship between X and Y. For example, the relationship (b) between minutes spent watching Gossip Girl (X) and jollies derived from watching Gossip Girl (Y), is likely higher for me than it is for you. Because of this, I watch Gossip Girl every Monday night, read Daily Intel on Tuesday at work, and afterwards, I email my friends “+50 for Chuck Bass saying ‘Because I’m Chuck Bass’”. My 15 year old sister calls me about it on Wednesday after watching it online (because she’s not allowed to watch tv on weeknights!) and we giggle about it together like two 15 year old girls. Now, if I were to look at the data on Gossip Girl viewership to try to estimate the joy the average dude would get from watching it, we’d find the average only for the people who actively choose to watch Gossip Girl. Since these are precisely the same people who get more jollies out of it than most, our estimate of the effect of X and Y would be biased upward therefore causing us to over-estimate the effect of Gossip Girl on happiness. This problem is called “self-selection” or selection in its general form.
Switching gears to poker, suppose there was a training program that could increase player’s winnings. Here are five people considering taking the class

Across the population, the average return is $2,800. But let’s assume that the class costs $4,000. Who would then take the class? If our subjects are rational economic agents (you know, like most gamblers), they will only take the class if the net benefit in increased winnings outweighs the cost. Performing the cost benefit analysis we find that Andrew and Stu would be the only ones to take the class:

If we were to then estimate the effect of the class on winnings using just those who attended, we would estimate the average effect to be $5,500 for a net effect minus costs of +1,500 per player. While the estimate is accurate for this particular subset, it overstates the effect the class would have on the population as a whole. In fact, after fees, the average effect of the class is -$1,200.
This is actually a perfect example of what multilevel marketers like Amway do in an effort to recruit new members. After-hours seminars are set up in a non-descript office park, the few success stories are paraded around, and afterwards, an “Amway Business Owner” offers to sell you several Robert Kiyosaki books such as the notorious “Rich Dad, Poor Dad” and the lesser known “Cashflow Quadrant” (I wouldn’t be surprised if a study came out in a few years showing that the housing bubble was actually caused by hordes of Amway automatons pumped up by Kiyosaki’s real estate happy talk). Conned into giving up their money and their dignity, most of the people that get involved in these schemes never make back their initial investment.
Observed hands are also subject to selection. Players fold the majority of hands before they actually reach a showdown, and even then, the loser can muck his hand if he’s beaten. In a sample of 174,305 hands of low limit online poker, only 6,551 hands (3.8%) were actually observed at showdown. Since only the best hands (and some bluffs) are not folded, we can expect that observed hands are on average, far stronger than unobserved hands. This certainly complicates our efforts to model situations in online poker, but by how much?
To refine what we mean by better or worse hands, we will use the patented Sklansky Hand groups to order all of the starting hands in Texas Hold’em into 9 ranked categories:

Using a personal database which reveals all of the hands dealt to a player, we can identify the size of the selection bias by comparing the frequency of observing each hand group in the full datasets of 174,305 hands to the frequency of observing each hand group in the showed datasets of just 6,551 hands. Plotted out, the frequencies for each category give us the hand range distributions for showed hands vs. all hands:

Figure 1: A plot comparing the frequency of observing a hand in each of the 9 Sklansky Hand Groupings for all hands dealt vs. only those shown at a showdown. The probability distribution for all hands is indicated by the black line, while the probability distribution for shown hands is indicated by the blue line. The yellow bars show the net difference between the frequencies of each group.
The observed hand selection bias is in fact, quite large. Crappy group 9 hands represent 61% of all hands dealt, but only 20% of hands that are observed at showdown. Likewise you are more than five times as likely to observe a tier 1 hand like Aces or Kings in a sample of observed hands than you are in a population with all hands revealed. The showed hands distribution is roughly flat before dipping around group 6 – 8 and rising up for group 9. Meanwhile, the revealed hands distribution is heavily left skewed with the majority of hands coming from the garbage group 9 category.
Have these results just shot a massive hole through our plan of modeling opponent hand range distributions? Do other options exist besides using public datasets of showed hands? While we could rely entirely on personal datasets with all hands revealed (like the one used to generate Figure 1), our sample of player databases would also be subject to selection bias since only certain types of player actively track their hand histories. And before you get any ideas, datasets revealing every player’s hand do not exist outside the poker site’s offshore facilities and Russ Hamilton’s hard drive. ![]()
Defeat appears emanate but is all truly lost for our hero statistician? Tune in next time to find out if this series ends at #4!
-chaz
[1] Adapted from: Lich-Tyler, Stephen. “Supplemental Notes from Econ 570: Econometrics.” (2008)