Probability Basics for Middle-School Teachers
The standards for probability at grade seven deal with theoretical and empirical probability, probability models, and compound events.
Much of this material is common sense or things you already know. You may need a refresher on the vocabulary, however, and instruction on some of the visualizations that will help you and your students understand what’s going on.
This document will also fill you in on some of the issues in probability and probability theory that come later.
The Quick Summary
Don’t want to read the whole thing? Here you go:
Theoretical probability is when you know what to expect because it has to be that way, like the probability of getting heads when you flip a coin is P( heads ) = 0.5. Empirical probability is from experience: if you toss a paper cup and it lands on its end three times out of twenty, then P( end ) = 0.15.
If you put two or more events together, you get a compound event. Here is the classic question with a classic wrong answer:
If you toss two coins, a nickel and a penny, what is the probability that you get one heads and one tails?
Wrong solution: There are three possibilities: 0, 1, or 2 heads. So the chance of getting one “heads” is 1/3.
You could show that this response is wrong empirically by tossing two coins enough times, or theoretically by analyzing the situation. You can visualize this compound event with tables and tree diagrams. Here they are:
In these diagrams, you can see that, in fact, there are four outcomes. Two of them have one tail and one head. So P( one head and one tail ) = 1/2.
If that summary doesn’t tell you all you want to know, read on…
A probability is a number. It represents the likelihood of something happening. Furthermore, it has to be between zero and one.
An event with a probability of zero is impossible; it will never happen. If the probability is small—close to zero—it’s unlikely. If it’s close to 1/2, it’s (literally in the case of a coin) a toss-up. A high probability—close to one—is very likely, and a probability of 1 represents an event that will certainly happen.
There are two ways to figure out a probability: empirically and theoretically. Let’s start empirically.
Suppose that Denise usually arrives in class first. Usually. Not always. You count for two weeks and she was first into the class seven out of ten times. The next day, will she be first? Probably, but not for certain. We can assign a number to how probable it is: 0.7, that is, 7/10.
That’s an empirical probability. It’s based on experience. You look at what happened in the past, and use that to make predictions about the future. A cool thing about probability is that you don't have to make firm, black-or-white, yes-or-no predictions: your prediction is that there’s a 70% chance that she will be first. The "70%" recognizes that you’re not certain, but it lets you be quantitative about how certain or not certain you are.
Vocabulary note: when you count up how many times something happens, the number you get is called a frequency. (This conflicts with the definition in science, where frequency usually means how many times something happens in some time period.) So in ten days, the observed frequency of Denise being first was 7.
A theoretical probability is when you have a way to know the probability from something other than experience. For example, if you roll a die, the probability that you roll a four is 1/6, or about 17%. That’s because there are six numbers on your die, and each of the numbers is equally likely.
Don’t be frightened by the term probability model. It’s best explained by example.
Suppose you have 28 students in your class. You put every student’s name on a card and shuffle the deck. Whoever’s name is on the top card gets the first question. What’s the chance that Emmitt is first? 1/28.
You have just used a probability model. You have assumed that each card is equally likely because you shuffled. In this case, to generalize, you have used the (common, excellent) model that if you have n things that are equally likely, the probability of each is 1/n.
Not all situations work that way, though. Suppose you put a strawberry and nine blueberries in a bag, reach in, and pull one out. What’s the chance that you pick the strawberry? One-tenth? No way. You can tell the strawberry from the blueberries by feel. Even if you didn’t care, and tried to be fair, they’re different sizes, and that will affect the likelihood of picking the strawberry. So to find the probability, you try it. If you did it 50 times (replacing the fruit each time, of course) and picked the strawberry 17 times out of 50, you would have a 34% empirical probability of picking it. You could use that in a prediction for the future.
This berry experiment also uses a probability model, this time an empirical model based on observed frequencies.
Using and Evaluating Probability Models
Again, let’s use an example.
Suppose you have a die that you suspect might be loaded. The probability model for a fair die is that each face has a probability of 1/6. So you take the suspicious die and roll it 60 times. The “expected frequency” for each number is 10. Ten ones, ten twos, and so forth.
But that’s not what you get. Is the die loaded? Not necessarily. After all, even if you rolled a fair die 60 times (there’s no such thing as a fair die, but we can still imagine rolling one…) you probably would not get 10 of each number. So you have to decide if the distribution of results is far enough away from what you “expect” that you have evidence that the die is not fair. (The illustration is from 60 rolls of a fair die.)
At grade seven, you will make this assessment qualitatively—by “feel.” In more advanced statistics, you do the same thing, except that you make that “far enough” criterion quantitative.
Another example: throughout the season, Esme has hit 60% of her free throws. During the latest game, she came to the line 10 times and hit eight. That’s 80%! Did she have a “hot hand,” or is this something we can expect just from randomness? The obvious probability model is that each free throw has a probability of 60%. We might evaluate the model by simulating sets of ten free throws at 60%. The most common results will be 6 “successes,” but how often will we get 8 good free throws out of 10? That’s the question. If "8 out of 10" happens frequently, we can say that Esme's performance is consistent with our model. But if we never see 8 hits out of 10 in 100 tries—100 simulated games of 10 free throws—we’d be reasonably sure that for the latest game, the 60% model didn’t work.
The illustration shows the results of 100 simulated games of 10 free throws each. Eight hits is unusual for Esme, but not super-rare.
So far we’ve been pretty informal. Let’s take a moment and add some more formal math, using our examples about Denise, who is often first into the classroom, and rolling a die:
P(something doesn’t happen) = 1 – P( it happens )
Applying that principle to dice,
P( rolling something other than a four ) = 1 – 1/6 = 5/6.
If this principle seems painfully obvious, it is. But it’s useful because sometimes it’s easier to figure out the probability of something not happening than the probability that it happens. We’ll see an example later.
Disturbing Background Issues
Probability may seem simple so far, but there are lurking complexities. As teachers, we might downplay some issues for the sake of simplicity. But they’re still there and may come up in the classroom. Here are a few:
The Prototypical Compound Event
An example of a compound event is flipping two coins: a nickel and a penny. Each one will be heads or tails. The two events are separate; you could think of them as sequential: we’ll flip the nickel and then we’ll flip the penny.
When we flip the nickel, the probability of heads is 1/2. So is the probability of tails. (We’re using the obvious, fair-coin probability model.) It’s the same for the penny, but the result from the nickel has no effect on the penny. The two events are independent. When we follow this through, we see that there are four possible outcomes:
These four possibilities are all equally likely—so each one has a probability of 1/4. Is that really true? If it is, we should be able to calculate probabilities we know. For example, we can use our addition rule and calculate the probability of getting heads for the penny alone. The penny shows heads in the first and third possibilities in the list above. So P( penny heads ) = 1/4 + 1/4, or 1/2, which it has to be.
Now suppose we ask, “I’m flipping two coins; what is the probability that I get one heads and one tails?” The right answer is 1/2: it could be either of the two middle possibilities.
The wrong answer is that there are three possibilities: no heads, one head, or two heads. So the probability of one head is 1/3.
What was wrong with the thinking? Because, as plausible as it sounds, we have no reason to believe that zero, one, and two heads are equally likely. We do, however, have reason to believe that heads and tails are equally likely. So, since we know the probabilities of the individual coin flips, we should analyze the compound event (two coin flips) based on those well-understood individual events.
Representing Compound Events
The illustration is pretty elaborate, showing each step along the way. But it doesn’t have to be. The point of the diagram is to help you list all the possibilities; as long as you can do that, the diagram is doing its job.
You could also make a table. If there are two events, you can make one of them into rows and the other into columns:
The table organizes the different combinations clearly; you can be sure of listing all four possibilities.
A big advantage of the table is that it makes it easier to list more possibilities. One important application is analyzing the sum of two dice. In this diagram, you can see that there are 36 (equally-likely) possibilities. Six of them give a sum of 7, so the probability of rolling a 7 with two dice is 6/36, or 1/6. The probability of rolling a 10 is 3/36, or 1/12.
Making this as a tree diagram would be a mess. Too many branches!
A disadvantage of the table is that it’s harder (although not impossible) to represent more than two events being combined, whereas in the tree diagram you can just add another layer of branches, for example, if you were flipping three coins.
Suppose we look at Denise again, who has arrived first in class 70% of the time. What’s the chance that she will be first both of the next two days? This is a compound event: we’re combining two separate events—the two classes on Monday and Tuesday.
We could make a tree diagram or a table, but it’s not clear (yet) how that would help us, because either would look just like the coin visualizations above—but those wouldn’t take the 70% into account. The individual table cells or tree branches would look like they indicate a probability of 1/4. However, the individual outcomes are no longer equally likely, so that 1/4 is probably not correct.
In the illustration, we’ve used 10 squares, so we could represent the probability nicely. On Monday, 70% of the distance across is for “Denise arrives first.” Likewise for Tuesday and the vertical direction. It’s as if we have ten equally-likely outcomes (ten cards, say) and seven of them are for Denise arriving first.
Now it’s clear: there are 100 possibilities, really, though many of them are the same. And of those, 49 (7 x 7) have Denise arriving first on both days. So the probability is 49/100, or 0.49. (And what’s the chance that she arrives first on only one day? 0.42. Make sure you know how we got that!)
The Multiplication Rule
The example area diagram above suggests a way to calculate the probability without counting little squares: we could multiply the two probabilities in the compound event: 0.7 x 0.7 = 0.49.
This approach also makes sense from a linguistic perspective: 7/10 of the time, Denise arrives first on Monday. Now, what fraction of the time does she also arrive first on Tuesday? That’s 7/10 of that 7/10. And “of”—especially with fractions—suggests multiplication. How do find out what 7/10 of 7/10 is? Multiply them: 49/100.
This powerful generalization works even when you can’t easily draw the diagram on grid paper. Some examples:
The multiplication rule does not appear in the seventh grade standards, but it’s the direction we’re heading.
When Do You Multiply, and When Do You Add?
Remember: the multiplication rule for probability does not appear in Grade 7. But the moment you get the idea, it’s super-powerful, and you want to use it.
The danger is that you use this rule wrong. A further danger is that you try to memorize some rule about when to multiply and stop thinking about the situation! So some guidance is in order.
You add when you have two possible outcomes for the same event. Think about rolling a four or a two. You add the two 1/6’s to get the probability that this single roll will be four or two.
You multiply when you’re looking at getting specific outcomes in two separate events (making a compound event). If you flip a coin and roll a die, the chance of both getting heads and rolling a four is 1/2 x 1/6, or 1/12. This always works if the two events are independent (one does not affect the other). If they are not independent, multiplication still works but you need to use the correct (conditional) probability.
Here is another revealing way to think about which operation to use: add when you use the word or and multiply when you use and. For example, what’s the chance that I roll four or two on this roll? 1/6 + 1/6 = 1/3. What’s the chance that I roll four this time and I roll two next time? 1/6 x 1/6 = 1/36. Of course, the "or" has to be in the same event, and the two outcomes have to be mutually exclusive. For "and," the two occurrences must be in different events. After all, the probability that you roll a two and a four in one roll of a die is zero.
This bit of mathematics obviously connects to the statistics students are learning. But it also connects to other content standards:
Additional Directions, Ideas, and Perspectives
Strategic Education Research Partnership
1100 Connecticut Ave NW #1310 • Washington, DC 20036
serpinstitute.org • (202) 223-8555 • firstname.lastname@example.org
Project funding provided by The William and Flora Hewlett Foundation and S.D. Bechtel Jr. Foundation
Poster Problems by SERP is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.