Monday, May 13, 2019

Evaluating the Validity of Data Reported in Social Media and the Press

I was going to write a blog posting on this topic, but then I found this excellent article written by The Writing Center of the University of North Carolina.  Yes, maybe I wimped out, but this is really a good summary of how to look at data critically. 

However, just to reinforce a couple points.  

Don’t trust data because its quoted on one of your social media ‘friends’ posts. You may trust Bob, and Bob trusts Alex, who has always trusted Sanjay, who trusts Cindy, who trusts Cal, who has an agenda and is distorting the truth.

As the article points out, there are three ways to calculate the center of a data set (Mean, Median, Mode).  Often those with an agenda choose the one that helps to make their point the best.

Finally, I am always suspicious when a data set makes me say ‘Yes! That’s just what I thought. I knew I was right.”  

Am I falling for a biased study, because it matches my beliefs? Question yourself as much as you question others.

Monday, April 15, 2019

Outlier Identification

Its been a while since my last post.  But I can assure you that this post is not an outlier... or is it?

Identifying outliers in a data set is one of the most difficult tasks we face as problem solvers. Mostly because there are no definitive tests which absolutely identify whether a data point is unique or if it is a natural, expected part of the data set. 

Outlier identification reminds us that being a statistical practitioner requires more than a good handle on statistical tools and good knowledge of the process from which the data was collected. Outlier identification requires the ability to use one's mind to take in all this information and make the right decision. Well, at least not make the wrong decision.

The attached is a summary of some methods to look at outliers. It is not a complete compendium of the issue. Please comment below if you have other methods for outlier identification that you have used, or if you feel my presentation needs corrected or adjusted. 

Outlier Indentification

Friday, July 13, 2018

Does country radio play more male singers than female singers?

I admit that I do not listen to much country music but when I do, it always seemed to me that the playlists on country radio are almost entirely men.

But is this a fair claim? Can statistical analysis help us decide?

When I first started thinking about this I realized that it is probably not fair to compare radio playlist counts of men and women singers to the proportion of men and women in the population (50/50). I do know that this sounds odd.

We need to consider that if the proportion women in the business of recording country songs is less than 50%, then we cannot expect the number of records by female artists that are ‘available’ to be played to be 50% of all country records. If fewer women are recording, then fewer women are available to be played on the radio.

To try to find out the real proportion of men to women who are in the business as a country artist I searched on line and found two lists of  country artists which appeared to be independent of radio playtime.

In Wikipedia I found a list of “modern” country artists and I categorized these artists and groups by gender. The other list was from  Country Notes which has attempted to create a complete list of all country artists from the early days through today. This list even contains a few pop stragglers such as Lionel Ritchie and Olivia Newton John who veered into the country lane.

Note: Group acts are counted as male or female depending on the lead singer (so Sugarland is counted as Female and Old Dominion is counted as Male).

From these two lists the ratio of male country artists to female country artists is about the same. The wiki list shows a ratio of 66% men to 34% women and the Country Notes list shows a ratio of 63% men and 37% women.

Given an assumption that more women are in the business today than in the early days, I picked a ratio of 65% men to 35% women as the make-up of all Country artists “available” to be played. One can see that this is much different number than 50% men and 50% women.

Therefore, right or wrong, when we listen to country radio, we should expect to hear about 65% of the songs feature male singers and about 35% of the songs feature female singers.

A side issue of “why” less women are in the country music business is a deeper topic and may actually be the real point of the issue.  However,  this article is not going to address that. I am just going to look at the numbers.

Now that we know what our expected proportion of male singers to female singers is,  we can search the playlists from a sample of  country music radio stations and calculate the percentage of men and women being played on the radio. We would compare this observed proportion to our expected proportion. I also took data from the Billboard Top 50 for the years 2016 and 2017 as well.

But how do we determine if the observed percentage of men versus women on a playlist is significantly higher than expected? If we find that a station is playing 66% male singers and 34% female is that significant? Is 70% men versus 30% women significantly different from our expected 65/35 ratio? Is 80/20 significantly different?

One method that can help us decide is a statistical tool called the Chi-Squared analysis for categorical data. This tool takes data such as Yes/No, Male/Female, Democrat/Republican/Independent and helps us to calculated differences.

This tool is used often in sociology, medicine, and biology studies and evaluates the observed proportions and compares them to the expected proportions. It then lets us know if the observed is significantly different than expected. For instance Chi-Squared analysis has helped to answer questions such as is there a difference in the effectiveness of Drug A between men and women or between Adults and Children.

Of course caveats to the results of this particular study apply.

  • This study looks only at the numbers not at the reasons.
  • Data from only four country radio stations was collected. Also the data from these stations is for one specific day’s playlist.
  • I do not know how Billboard determines its “Top 50”.
  • Chi-Square tests are sensitive to sample size. With small samples sizes it is harder to show a significant deviation between observed and expected.
  • These results do not mean to imply any intentional bias by any organization or radio station. 
  • These results cannot explain ‘why’ more male singers are played than women.

 Example Chi-Square Calculation:


Of the two samples from a major song ranking service's Top 50 list, the Top 50 for 2017 did not show a difference between observed and expected and the 2016 Top 50 did show a significant difference may exist (more men on the list than women)

Of the five radio stations sampled, 3/4 showed a significant difference may exist (more men played on the radio than women)

So there you go. The answer to the question “Does country radio play more male singers than female singers?” is a resounding “maybe.”

Thursday, August 10, 2017

Capability and MSAs are NOT the same thing!

I have found that people often confuse MSA’s and Capability Studies.  Far too often, I hear the question ‘when will we run the capability study on the tester?’   And while I am sure that you few braves souls who read my blog do not fall into this trap, you might know of people who do. Maybe giving them this link will help.

MSA’s are for tests and gages. Capability studies are for the processes being measured. 

Or to state it another way, MSA's give us confidence that we can measure the capability of our process to produce parts to our customers specification.

One can talk about the 'capability' of a tester, but only when the word is being used in its classic sense, i.e. ' the extent of someone's or something's ability.'

Let's review....

Measurement System Analysis 

A measurement system is a collection of procedures, gages and operators that are used to obtain measurements. Measurement systems analysis (MSA) is used to assess the ability of a measurement system using the following statistical metrics;  stability, repeatability (test / re-test variation) and reproducibility (operator variation). 

The most common metric for an MSA is the Gage R&R value. This value is a ratio of the  variation due to the measurement error (repeatability and reproducibility) to the total variation of the system (including both part and measurement variation).

Gage R&R = Variation due to R and R / (Measurement + Part Variation)

Sometimes, one cannot find parts that demostrate part variation to use in the MSA. An example is in electronics with electrical testing (in-circuit tests or functional testing). These systems make hundreds of measurements and it is impractical to attempt to create or find part variation to use in the MSA

In these cases we usually run the MSA with ten or so parts off the line. In these cases, the part variation will be very low. Therefore the Gage R&R should be calculated as a percent of the tolerance spec range.

Gage R&R = Variation due to R and R / Tolerance Range

Capability Analysis

From Wikipedia…. “The process capability is a measurable property of a process to the specification, expressed as a process capability index (e.g., Cpk, Ppk, Cp, and/or Pp). 

The output of this measurement is usually illustrated by a histogram and calculations that predict how many parts will be produced out of specification

Two parts of process capability are: 1) measure the variability of the output of a process, and 2) compare that variability with a proposed specification or product tolerance.

Cp (or Pp)  = Spec Range / (6 x total system variation (std dev))

Capability studies assume that the measurement variation is low enough to not be a factor, and that the “total variation of the system’ is effectively due to process and part variation.  

Remember that to do a proper capability study we need a successful MSA,  a stable process, and an large enough sample size to be statistically significant (usually about 90 pieces). Getting a good capability study from prototype builds is difficult due to the (usually) small sample size. 

Monday, July 10, 2017

iSixSigma Article

My new article is on iSixSigma: A Study of Estimates of Sigma in Small Sample Sizes -

Wednesday, October 19, 2016

Studying the Capability of Capability Studies - Part 3

Sample Size

As most of you know, statistical tests calculate a mean and confidence intervals on the mean. We are all familiar with the fact that as our sample size decreases our knowledge of the “true” mean becomes less and less certain.   This is important for tests that use the mean -- such as the t-test and ANOVA.

Below is an example of two data sets “Apples” and “Oranges”. In the first experiment we only had 15 samples of Apples and 15 samples of Oranges.  Plotting the means with their calculated confidence intervals shows that we cannot differentiate between Apples and Oranges (since the confidence intervals overlap, we cannot be certain that both means are not equal).

But if we increase the sample size to 100 Apples and 100 Oranges, our confidence intervals decrease and we can now tell that Apples do have lower values than Oranges.

I am sure that most of you are realizing where this diversion from capability is heading.

Capability studies (and Gage R&R studies) use the standard deviation as a primary statistic and like the mean, the standard deviation also has a confidence interval.

The confidence interval of the standard deviation is also dependent on the sample size. As the sample size increases our estimate of standard deviation becomes better. For those who like this stuff, here is the formula for confidence interval on the Standard deviation.

Given a standard deviation of “1” and using the formula above, we can now plot the how the confidence intervals contract as sample size increases.

In other words, as my capability study’s sample size becomes smaller, the range in which the “true” standard deviation can exist becomes larger.

So, here’s the kicker.

One can calculate the upper and lower limits of Pp for various sample sizes and alpha values and look at the potential range of Pp or Ppk for a capability study. 

Let's look at two examples of how this works.

Example One:  From the chart above,  let’s assume an alpha of 0.05 and a sample size of 15 units. Let’s also assume that we ran our capability study and calculated Pp = 1.67.

Now we could stop here and report to our customer that our Pp = 1.67. But, if we do the calculations we see that the actual Pp could lie between Pp= 1.1 and Pp = 2.2. If your customer has a savvy Six Sigma expert, you could be busted.In my Six Sigma youth, I was busted. Lately, I've been doing the busting.  I like that better.

Example Two: If we increase the sample size of the capability study to 120, and if our calculations find a Pp = 1.67, then the actual Pp could be between Pp = 1.5 and Pp = 1.8. With the larger sample size we are much more assured that the Pp is actually close to 1.67. In this case, we would have a much better chance of defending a Pp = 1.67.

In summary, never forget that the likelihood that the Pp value you report is effectively correct is dependent on the size of your sample.  This is one reason to avoid capability studies on small prototype builds and experimental runs.

We can also see from the charts above that once we have a sample size greater than about 90 pieces, the incremental improvement (decrease) in the range of our standard deviation (and thus our Pp) becomes small enough to ignore.

I hope that you have enjoyed these blog postings on the vagaries of Capability studies (Cp versus Pp, Data distribution, and Sample size). Practically we cannot always control these factors, but we can at least go into our study with our eyes open and a clear understanding of our ‘risk of being wrong’. 

Hmmm... Maybe my next topic will be "The Risk of Being Wrong"

Capability of Capability - Part 1
Capability of Capability - Part 2