I spent the last week of summer in the Rocky Mountains with my family. We rented a little Wild-West-themed house overlooking the national park. Among the toy pistols and copper pots, my siblings found a deck of cards.
The three of them decided to play Go Fish. Each person drew seven cards. Then my brother announced a rule: if anyone draws three of the same card (like three 8’s), return your cards and shuffle the deck. His reasoning was that getting dealt a triple is a weird anomaly, so if it happens, the deck is probably not shuffled well and the game won’t be fair.
I was suspicious of that. My hunch was that a triple isn’t actually so rare. So how could we resolve this dispute? What are the actual odds of getting a triple if the deck is shuffled randomly?
And, perhaps more interestingly, how would you go about figuring that out?
Option 1: Just do it ten million times
Okay, not really. But let’s say we could shuffle and deal the cards 10,000,000 times. Then we would have a way to guess the probability for next time.
If I’ve been to the grocery store 100 times, and apples were on sale 20 of those times, I can guess how likely it is that apples will be on sale next time I go to the store. Based on my experience, there is roughly a 20/100 (i.e. 20 out of 100; that is, 20%) chance of an apple sale.
So if I deal 100 hands and 9 of them have a triple, my best guess would be that there is a roughly 9/100 chance (or 9% chance) of me drawing a triple. If I deal 10,000 hands and 681 have a triple, then my new best guess would be a 681/10,000 chance (or 6.81% chance).
On the airplane ride home from our vacation, I wrote this piece of code. Essentially, it simulates drawing a random 7-card hand from a 52-card deck. On the plane, I ran the simulation 10,000,000 times, and had the program tell me what percentage of those hands draws a triple.
It turns out you get a triple about 7.7% of the time, which isn’t likely, but hardly impossible. For comparison, the odds of rolling snake eyes (two 1’s) on a pair of dice is about 2.8%. That’s a lot less. Can you imagine a rule that forced you to re-roll after getting snake eyes, because it was an unlikely roll and so the dice must be rigged?
Okay, so we’ve got an estimate.
Is that the best we can do?
Option 2: Figure it out abstractly
There are lots of mathematical tools to answer questions like this. We want to know how likely it is that a random 7-card hand from a 52-card deck will have three cards with the same number value. It helps to know that every hand is equally likely: it’s just as likely to draw an ace of spades as it is to draw a ten of diamonds, and so on.
Let’s first think about a simpler case. Imagine there were only 10 possible hands, all equally likely, and 4 of those hands contained a triple. Now I draw a hand. How likely is it that my hand contains a triple?
Well, it’s going to be one of the 10 possible hands, and 4 out of 10 of those hands contain a triple. This translates into a 4 in 10 chance– that is, a .4, or 4/10, or 40% chance– that my hand contains a triple. This kind of reasoning is an example of something called discrete probability.
We’re dealing with the exact same situation here. The only difference is that there are a lot more than 10 possible hands. All we need to do is count two things: how many hands there are all together, and how many hands have a triple.
It turns out that those are both huge numbers. They are way too big to count easily. But there are some simple formulas that people developed to help figure this out. Like many formulas, they are really just shorthand for a bigger idea. In particular, we’d like to use combinations. Combinations are a basic tool in the field of combinatorics. Combinatorics deals, in part, with the ways that things can be counted or arranged– some classic examples are ways to draw cards from a deck, seat people around a table, or pick marbles from a bowl.
For us, instead of choosing people to sit at a table, we’re choosing cards from a deck to sit in our hand, but mathematically it’s the same idea.
So to get an exact answer, I used combinations to count the total number of hands, as well the total number of hands with triples. Once I figured out what I needed to compute, I put those numbers into this calculator, because working out (52 choose 7)? Not going to happen.
Guess what? The exact answer really is about 7.7%. Cool!
But which way is better?
Option 2, the math way, is exact. It’s provable. And it tells you why the probability is that way– you use real information about the deck: which hands have triples, which don’t, and how many possible hands there are all together.
But what if we were dealing with something more complicated? For example, if you have three players, how likely is it to get a triple in at least one of the three hands? It’s certainly possible, and worthwhile, to figure that out abstractly. But it’s not as easy. Yet with one quick tweak, the computer program from Option 1 can deal with that scenario easily. An approximate, computational approach is sometimes more flexible than a direct one, and can be used to make educated guesses when we haven’t quite figured out the math yet.
And what if I’m dealing with something way more complicated, like the weather? We use computer simulations to deal with probabilities we don’t know how to describe perfectly, like the chance that it will rain tomorrow.
The answer, really, is that you need both. This is true not just in probability, but in lots of areas that use both math and computer science. Sometimes it’s the right time for approximations and simulations, and sometimes it’s the right time for pure math. But very often, in areas like machine learning and data science, the best approach is a balance between the two.
Postscript: Okay, but is the deck rigged?
There is one last step you need to really figure out if the deck is fair: applying Bayes’ theorem. I’m not going to explain this theorem in full, as it probably deserves its own post. But I think it’s necessary to mention, for completeness.
If you get dealt a triple, you might want to know how likely it is that the deck is rigged. That’s a slightly different question than how likely it is to get a triple from a fair deck. To figure this out, you need some other information: how likely it normally is that the deck is rigged, and how likely it is to get a triple from a rigged deck.
Without going into too much detail, though, this step still tells us that the deck is probably not rigged. If you’re interested in how this kind of reasoning works, check out these examples, or just ask me.