**Sample Size**

As most of you know, statistical tests calculate a mean __and__
confidence intervals on the mean. We are all familiar with the fact that as our
sample size decreases our knowledge of the “true” mean becomes less and less
certain. This is important for tests
that use the mean -- such as the t-test and ANOVA.

Below is an example of two data sets “Apples” and “Oranges”.
In the first experiment we only had 15 samples of Apples and 15 samples of
Oranges. Plotting the means with their
calculated confidence intervals shows that we cannot differentiate between
Apples and Oranges (since the confidence intervals overlap, we cannot be certain that both means are not equal).

But if we increase the sample size to 100 Apples and 100
Oranges, our confidence intervals decrease and we can now tell that Apples do
have lower values than Oranges.

I am sure that most of you are realizing where this diversion from capability is heading.

Capability studies (and Gage R&R studies) use the
standard deviation as a primary statistic and like
the mean, the standard deviation also has a confidence interval.

The confidence interval of the standard deviation is also dependent
on the sample size. As the sample size increases our estimate of standard
deviation becomes better. For those who like this stuff, here is the formula
for confidence interval on the Standard deviation.

Given a standard deviation of “1” and using the formula above,
we can now plot the how the confidence intervals contract as sample size
increases.

In other words, as my capability study’s sample size becomes
smaller, the range in which the “true” standard deviation can exist becomes
larger.

So, here’s the kicker.

One can calculate the
upper and lower limits of Pp for various sample sizes and alpha values and look
at the potential range of Pp or Ppk for a capability study.

Let's look at two examples of how this works.

**Example One:** From the chart above, let’s assume an alpha of
0.05 and a sample size of 15 units. Let’s also assume that we ran our
capability study and calculated Pp = 1.67.

Now we could stop
here and report to our customer that our Pp = 1.67. But, if we do the
calculations we see that the actual Pp could lie between Pp= 1.1 and Pp = 2.2. If your customer has a savvy Six Sigma expert, you could be busted.In my Six Sigma youth, I was busted. Lately, I've been doing the busting. I like that better.

**Example Two:** If we
increase the sample size of the capability study to 120, and if our calculations
find a Pp = 1.67, then the actual Pp could be between Pp = 1.5 and Pp = 1.8.
With the larger sample size we are much more assured that the Pp is actually
close to 1.67. In this case, we would have a much better chance of defending a
Pp = 1.67.

In summary, never forget that the likelihood that the Pp value you report is effectively correct is dependent on the size of your sample. This is one reason to avoid capability
studies on small prototype builds and experimental runs.

We can also see from the charts above that once we have a
sample size greater than about 90 pieces, the incremental improvement (decrease) in
the range of our standard deviation (and thus our Pp) becomes small enough to
ignore.

I hope that you have enjoyed these blog postings on the
vagaries of Capability studies (Cp versus Pp, Data distribution, and Sample
size). Practically we cannot always control these factors, but we can at least
go into our study with our eyes open and a clear understanding of our ‘risk of
being wrong’.