Pokeit Letter #4 – What is the Probability of Getting Head on First Try
This question, posed to a class full of undergraduates, wasn’t interpreted quite the way Mustafa had intended. At the time, I was taking Stat 10 to satisfy a pre-requisite for my hastily put together plan to switch majors and apply to UNC’s business school. See, it turned out that physics just wasn’t my calling. Perhaps it was Dr. Yu Wu’s prickly demeanor in Physics 26, or the fact that Dr Hernandez in Physics 27 was an asshole. Whatever it was, my grades in physics had not been compelling. By the time the second Physics 27 midterm rolled around, knew I had to stop griding glass into my eyes. I spoke with my academic advisor, and by the end of the day, I had signed up for STAT 10 and ECON 10 in the Spring. Next semester, Mustafa Tural introduced me to statistics.
You’d be hard pressed to find a more non-caring group of people just going through the motions, than the one taking STAT 10 in the Spring of 2005. Let’s be honest, no one is there because they’ve got a passion for Bayes. You’re in STAT 10 because you want to get into the B-School. You want to get into the B-School because you want a well paying job out of college. You want a well paying job out of college because, well, you want to make the big bucks; and you don’t care enough about school to get a graduate degree (and no, an MBA doesn’t count).
I think I may have eked out a B in the class – maybe. I know of at least 4 times when I was more than 20 minutes late to that class because I was finishing my homework in the hallway. While I was reviewing a particularly shitty test with Mustafa during office hours, he asked me a pointed question:
“Do you care at all?”
I think my reply was “What?”
So as penitence for not paying attention in STAT 10, I’ve decided to do my part, and teach it to all of you. So let’s get to it. Consider this post a crash course in STAT 10 for poker players.
Descriptive statistics[1]
First things first, we need to distinguish between two concepts: the population is the entire group that we wish to study, while the sample is the subset of the population for which we have information. Ideally the sample is an unbiased representation of the population drawn at random, but as we have already touched on with regards to observed hands, we can’t assume this to be the case.
When faced with a bunch of numbers, in either a population or a sample, we often look for simple ways to summarize the data. For example, Group A contains chip counts at an arbitrary table on Day 1 of the WSOP Main Event while Group B contains chip counts at that same table on Day 7 (chip counts are in thousands of dollars):
Group A – Day 1: $10.5, $7.8, $17.0, $11.0, $8.9, $9.5, $10.2, $23.4, $25.8
Group B – Day 7: $1,400, $503, $2,500, $5,230, $980, $1,900, $7,201, $3,290, $1,309
We can identify two general differences between these two groups. Group B tends to have much higher chip counts, and Group A tends to be clustered closely together while Group B is spread out. These two statements capture the concepts of “central tendency”, and “spread”.
Measurements of “central tendency” express whether the numbers tend to be high or low. The most common of these are:
Mean: The average value
Median: the middle value
Mode: the most common value (In practice the mode is useless)
The mean and the median of a population will be different if the distribution is “skewed”, meaning that there are larger (or smaller) gaps between values at the high end than at the low end. For example, the distribution of income is very skewed: the income of the wealthiest people differs by billions of dollars, while the income of the poorest people differs by pennies. Because of this, “mean income” might be a slightly misleading indicator, since a few wealthy people can pull the average up, so that most people actually have income below average. The median addresses this issue, by reporting the income of the person right in the middle of the distribution. In the March 2005 Current Population study, mean household income was $61,905. However, 63% of households earned less than the average. The median income was $46,400.
The second characteristic of “spread” captures whether observations are clustered closely together, or spread apart. The descriptive statistics most often used to describe “spread” are the variance or the standard deviation. The standard deviation in a group is the average distance between each observation and the mean, and the variance is just the standard deviation, squared.
A third concept of skewness, refers to whether the gaps at the top of the distribution are larger or smaller than those at the bottom. Skewness, however, is not synonymous with “biased”.
The maximum and minimum of a sample or population should be self-evident. Finally the Xth percentile refers to the value that X% of the group lies below. For example the median is exactly the same thing as the 50th percentile.
Probability
In probability, an event is something determined by chance that either does or does not happen. An event can be described as simple, meaning that there is only one way to achieve the outcome, or complex, meaning that there are a number of simple events that would satisfy the condition.
Let’s use a standard 52 card deck to illustrate this principle. There are four suits in a deck, spades (
), hearts (
), diamonds (
), and clubs (
), and there are 13 unique cards ranks in each suit, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A.
An event in this context is any single card. If you were to deal one card from the deck, A simple event might be the K
since there is only one of them. A complex event would be dealing any club (
), since there are 13 different cards that would satisfy this condition.
Formally, let S denote the space of all possible outcomes. Any event is a subset S. We use letters like A and B to denote generic events while ¬A and ¬B will denote the complement of A or B. The complement contains all the things in S that are not part of the event and it can be thought of as the opposite of any event. If A is the event that the dealt card’s suit is a “club (
)” then ¬A is the event that the card’s suit is “not a club (
,
,
)”. The union of two events, A
B, consists of all outcomes that satisfy one event or the other (or both); while the intersection, A
B, are all outcomes satisfying both conditions. For practical purposes you can read it as:
The union: (A
B) as “A or B”. This is the more inclusive set of events containing any event that is either A or B
The intersection: (A
B) as “A and B”. This is the more exclusive set of events containing only events that are both A and B
For example, let’s deal one card from a 52 card deck. Here, A is the event that the dealt card is a “club (
)” and B is the event that the dealt card is a “King (K)”. We can then write the following:

A probability measure is a function P[A] that tells us the fraction of times that an event occurs. The probability measure must satisfy three properties:
1) 0 ≤ P[A] ≤ 1
2) P[S] = 1
3) P[¬A] = 1 – P[A]
The first property states that the probability cannot be negative, and it cannot exceed one. The second property requires that something in the space of all possible outcomes will occur, and the third property states that if the chance that A happens is X, then the chance that A doesn’t happen is 1-X.
The chance of any complex event occurring can be found by adding up the probabilities of all the simple events contained in the complex event. There is a 1/52 chance that a particular card is dealt. The probability that the card is a “club (
)” thus is P[A] = P[2
] + P[3
] + … + P[A
] = 13/52 = 1/4.
If we already know the probability that some complex events occur, and we want to calculate the chance that their union occurs, we cannot simply add up the probabilities together. For example, there is an 13/52 chance that the dealt card is a “club (
)”, and there is an 4/52 chance that the dealt card is a “King (K)”. The chance that the card is “a club (
) or a King (K)” is not 13/52 + 4/52 = 17/52. If we take a look at our deck of cards in S, 16 of the 52 outcomes are either a club or a king. This should be the chance that A
B occurs. By simply adding P[A] to P[B], we have double counted the outcome that is both club (
) and King (K). The correct calculation of the probability is:
P[A
B] = P[A] + P[B] – P[A
B]
When two events do not intersect, we say that the events are disjoint or mutually exclusive. For example being dealt a King and being dealt an Ace are mutually exclusive events, since there is no outcome satisfying both conditions. In this special case, the probability that one or the other occurs is simply their sum:
P[A
B] = P[A] + P[B] if A and B are mutually exclusive events
Conditional probability
Suppose you are playing a game of Hold’em. You are in the big blind and everyone folds around to the player in the small blind. The player in the small blind reaches for his chips, but in the process he accidentally flips over one of his cards exposing the A
. Now, given that he was dealt one Ace, you want to know the probability that he’s holding pocket aces. Stated another way, you’re interested in the conditional probability that your opponent is holding two aces given that he is holding at least one ace.
If we know that event B has occurred (opponent dealt one ace), we can use this information to revise our expectations about A (opponent holds pocket aces). The probability of “A conditional on B” or the probability of A given B is always calculated as:
P[A|B] = P[A
B]/P[B]
Here, P[A
B] is the intersection of the first card and the second card being an ace. We consult the internet and find out that the probability of being dealt pocket aces is 12/2652. P[B] is just the probability of pulling an ace out of a 52 card deck, so P[B] = 4/52. Therefore, given that the first card is an ace, the probability of our opponent holding pocket aces is P[A|B] = (12/2652)/(4/52) = 3/51. This is in fact a nice, intuitive result. Once one ace has been dealt to our opponent, there are 3 left in the deck of 51 remaining cards. The total number of possible hands pre-flop is 52*51 = 2652. This is commonly reduced to 1326 since the order in which the two cards are dealt is not important.
We say that two events are independent if P[A|B] = P[A]; in other words, knowing B does not help us revise our probabilities that A occurred. In this example, “first card ace” and “second card ace” are not independent, since P[B] = 4/52, while P[A|B] = 3/51.
If we want to calculate the probability that an intersection occurs (that both A and B happen), we rearrange the formula for conditional probability so that:
P[A
B] = P[A|B] × P[B], in general; and
P[A
B] = P[A] × P[B], if the events are independent.
The last rule is incredibly useful. Bill Chen & Jerrod Ankenman put it to work in their Theory of Doubling Up, which is outlined in their epic tomb, “The Mathematics of Poker”. The theory, which is used to estimate the probability of winning a tournament, goes something like this:
Consider a winner take all tournament. Excepting for skill considerations, a player’s equity in the tournament is proportional to his chip stack. If we make a further assumption that the chance of a player doubling his chip stack is constant P[C] = 50% throughout the tournament, then the probability of him winning the tournament is P[C]N where N is the number of times the player must double up. In a four person tournament, N=2 (a player must double his stack twice to have all the chips in play). Thus the probability of winning the tournament is
P[C1
C2] = P[C1] × [C2] = (1/2) × (1/2) = 1/4
Likewise, the probability of winning a 128 person tournament where one must double up 7 times (1-2-4-8-16-32-64-128) equal to
P[C1
C2
C3
C4
C5
C6
C7] = (1/2) 7 = 1/128
When we have repeated, independent trails, this rule is convenient for calculating the probability that A and B occur. If we want to know instead the chance that A or B occurs, we have to combine several of our rules. Let’s say we’re holding a spade flush draw on the flop, and we want to know what the probability is that either the turn or the river brings a fifth spade. Thus:
P[(turn =
) or (river =
)]
= 1 – P[(¬{(turn =
) or (river =
)}]
= 1 – P[(turn ≠
) and (river ≠
)]
= 1 – P[(turn ≠
)] × P[(river ≠
)]
= 1 – (38/47) × (37/46)
= 0.3497
QED…
What this is saying is that the probability that a spade comes on either the turn or the river is the same as 1 minus the probability that a spade doesn’t come on either the turn or the river. Since the probability that the river is not a spade is dependent on whether or not the turn is a spade, the intersection is calculated using the formula P[A
B] = P[A|B] × P[B]:
P[(turn ≠
)
(river ≠
)] = P[(river ≠
)|(turn ≠
)] × P[(turn ≠
)]
P[(turn ≠
)] is simply the number of non-spades in the deck (47 cards in deck – 9 spades in deck = 38) divided by the number of cards in the deck (47) which equals 38/47. P[(river ≠
)|(turn ≠
)] is the conditional probability that the river is not a spade given that the turn was not a spade. If the turn comes out non-spade, there are still 9 spades left in the deck, so the number of non spades left in the deck is the number of remaining cards (46) minus the number of spades (9) which is equal to 37. Divide 37 by the number of remaining cards and you get the probability that the river is a non-spade given that the turn was not a spade: 37/46.
P[(turn ≠
)
(river ≠
)] = (38/47) × (37/46) = 0.6503
1 – 0.6503 = 0.3497
QED again…
In closing
Seeing as many of you spent your intro to statistics class playing donkaments on PartyPoker, I thought I’d preach the truth to you in a language you’d understand. Be proud of yourselves. If you’ve made it this far, you’re now equipped with the most elementary, yet essential knowledge of statistics. But let us not get ahead of ourselves, there is still much to learn. Bayes is looming in the background, and we have yet to resolve the little problem of biased observed hands. Keeping these challenges in mind, we shall end this lesson by remembering the solemn words of the great Barry Greenstein.
-chaz
[1] Adapted from: Lich-Tyler, Stephen. “A Primer in Probability”. (2008)