Calculate and Compare Proportions — with Validity and Confidence
Proportions pop up everywhere in life — “Four out of five dentists recommend…”, “Obama leads Clinton by ten percentage points among likely voters in Wisconsin…”, “Last month’s manufacturing yields are up from a year ago…”, and so on. Anytime you calculate the fraction, portion, or percentage something takes up from the whole, that’s a proportion. And Six Sigma provides a method for you to discover if one proportion is truly different from another.
To introduce the issues involved, imagine taking two quarters out of your pocket, flipping them ten times each, and then recording the number of heads each one gets. Let’s say the first quarter gets four heads and the second quarter gets five. Then the proportion of heads for the first coin is four out of ten (4/10 = 0.40 or 40%) and for the second it’s five out of ten (5/10 = 0.50 or 50%). The situation seems straightforward enough — the proportions are different, so something must be different between the coins. You might even conclude that the second quarter will get 10% more heads than the first.
But imagine repeating the test of the two coins. This time the first quarter gets six heads (6/10 = 0.60 or 60%) and the second gets five (5/10 = 0.50 or 50%). Have the coins changed? Why are their proportions now different?
This is the first principle you must recognize:
When a sample or subset of data is used to calculate a proportion (as it almost always is), there will be some uncertainty in the calculated answer — always some fuzziness or fringe.
Another way to say this is that if you kept on repeating the ten-flip coin test, you wouldn’t be surprised to get anywhere from three to seven heads. And every once in a while you might see two or nine heads. The figure below shows what typically happens after 10, 100, and 1,000 repetitions of the 10-flip coin test.

This same fluctuation happens in every situation where a subset of data is used to calculate a proportion. Like when the 100 most recent patients are used to calculate the proportion of all hospital patients getting a post-operative infection. Or when 1,000 residents are surveyed to determine the proportion of the city population who are in favor of the mayor’s new policy. Or when yesterday’s manufacturing yield (proportion good) is used as a gage the on-going quality of production. With fuzziness around all of these calculated proportions, how can you know what the true proportion is? And how can you tell if one calculated proportion for a situation is better or worse than another?
Armed with a couple of formulas and basic calculator skills, you can act confidently in these situations and base your decisions on true knowledge.
Knowing that there is fuzziness around every proportion calculated from a subset of data, what you do is calculate a confidence interval around your measured proportion. This interval quantifies how wide the uncertainty is. Here’s the formula and its terms:

where:
- y/n is the calculated proportion for the subset of data; y is the number of items in question found among n, the total number of items in your subset of data.
- Z is a look-up factor based on how confident you want to be that the true proportion is contained in the interval. The more confident you want to be, the larger Z becomes:

Here’s an example to see how the formula and table are applied. Imagine you randomly select 30 parts from an incoming shipment of raw materials which has has thousands of parts in it. Of the 30, you find 6 have problems; the other 24 are fine. In this case 24 out of 30 (80%) are good. If you wanted to be, say 95% confident of what the true proportion really is, then you’d plug the numbers into the formula and look up the Z value corresponding to a confidence of 95%, like this:

So although the calculated proportion of the subset of parts is 80%, if you wanted to be 95% confident in your decision, you can say that the true proportion for the entire shipment lies somewhere between 65.7% and 94.3%. That’s powerful knowledge!
When you want to compare two calculated proportions to validly tell if they’re different, you have to use a modified version of the formula (but use the same Z table). Here’s the modified version:

Looking at the left side of this formula, you can see that it is taking the difference between the two proportions. If the two calculated proportions are the same, then their mathematical difference will be zero. What you’re doing with this formula is calculating a confidence interval around that difference in the proportions. If the confidence interval encloses the value of zero, then the two proportions are actually the same — there is no valid difference between them. But if the confidence interval does NOT enclose zero, then the proportions are truly different. Let’s look at an example.
Suppose you’re handicapping the results of your state’s presidential primary election. After casting your own vote, you ask others exiting the polling place which candidate they voted for. You get 23 out of 50 surveyed saying they voted for Candidate A and 27 out of the 50 saying they voted for Candidate B. Based on your survey, will Candidate B win your state? Let’s say you want to be 90% confident in your conclusions. Using the modified formula:

Which means that the true difference in the proportions voting for each candidate lies somewhere between -24.4% and 8.4%. Since this interval encloses the value of zero, you cannot validly say that there is any difference between the candidates — it’s a statistical dead heat.
Now that you know how to validly look at calculated proportions, be sure to put this new knowledge to work for your advantage.
- When someone presents a calculated proportion value to you, think about the confidence interval that quantifies the fuzziness around that number.
- Even when two proportions are numerically different, don’t immediately jump to the conclusion that they truly are different. That’s as foolish as concluding that one coin is different from another after getting a different number of heads when you toss each ten times.
- Use the modified formula to calculate and manage the risk in your decisions.
- When you chart sequential performance calculations over time, know that level or steady performance will still have an up-and-down appearance. Don’t be fooled by the natural up and downs that aren’t big enough to be validly different.
Tags: comparison, confidence, interval, percentages, proportion, six sigma, yield

