Pith and Vinegar: Normal Data is Nearly Never Normal

Yes, I said it.
Normal data is nearly never normal.

In our Six Sigma classes we study outliers, shift, drift, and special cause events. But what we don't always realize is that the "odd" data points are quite often ubiquitous and not as rare as we would like.

First, let's look at a set of data that is behaving itself. The chart below is for a set of screw torques taken sequentially from a 'smart' driver. We can see that the data is normal (p=0.895) and the histogram and time series plot back that up.

So let's look at another set of data. Solder paste print height (think screen printed solder for you non-electronic assembly folks).

Yes! It's also normal. p= 0.095 which is greater than 0.05, right? I'm good to go.

Well.... obvious there is some discrimination issues here. The data is lumped into groups. Probably won't affect my calculations. But do I see something going on in the time series data? A pattern of rising and falling?

One very good tool for bringing out these patterns is called "Resistant Smooth" which according to Minitab "smoothes an ordered series of data, usually collected over time, to remove random fluctuations." (how Resistant Smooth works is a topic for another blog post).

Here is the actual solder print data compared to smoothed data.

So, maybe a pattern here. What do you think? At least you can now walk the process and see if you can find if the pattern is real and if so, where it is coming from.

So. Solder paste data is normal, right? Or at least normal enough to run a capability study. So we take solder paste height data from another process and run our normal capability calculations and....

What the heck!? Quick. What does the time series chart show?

Obviously there is something going on here in the process that creates a dual distribution. It appears to be time to go the gemba and see what is happening.

I guess the message here is two-fold. One, don't assume normality, even if past data has proved to be normal. Stuff happens. Two, unless you are performing tests that are sensitive to normality (e.g. capability) don't obsess over the distribution. Use non-parametrics, use logic, find the patterns, find the causes of the "odd" data points. Go to the gemba. Dig deep and learn.

And remember.

Normal data is nearly never normal.

Pith and Vinegar

x

Wednesday, August 24, 2016

Normal Data is Nearly Never Normal

No comments:

Post a Comment