x

x

Friday, May 8, 2015

Finally. Another Statistics Post by John!

I know, who cares? But you're getting my latest posting anyway.

Every since Fisher presented the "p-value" as a method to judge statistical validity of a hypothesis (null hypothesis significance testing or 'NHST'), statisticians have been arguing about it. 

Recently the editors of Basic and Applied Social Psychology (BASP) announced that the "journal would no longer publish papers containing P values because the statistics were too often used to support lower-quality research." BASP prefers using effect size and descriptive statistics.

Many statistical journals and blogs commented on this including two of our favorite blogs, Minitab and John D Cook's blog

While not a statistician, I have noticed one thing that may or may not be significant (Hah. See what I did there?). Most of the arguments of p-values come from the 'soft' sciences where groups of people are being tested, e.g. psychology or medicine and where the effect (or lack thereof) of a treatment is critical to understand in detail.

I also note that the goal of these 'softer' sciences (and I am NOT criticizing these sciences at all), is to find if a treatment has a marked effect. Does Drug A really help reduce symptoms? Does procedure X really improve psychological responses? Also in these type of studies, covariates and confounding is much more of a concern.

These types of studies are drastically different than in the engineering world where we are often happy eking  out tiny improvements. We are happy with small improvements of variation, or tiny shifts of the mean -- as long as the cost of these improvements are reasonable (various caveats apply to this statement).

However, whether an investigator is measuring the diameter of a ground metal shaft, or the effect of a drug on smoking cessation, one must not relay on any one statistical test or metric. Also, if one has performed an experiment, any experiment, and p = 0.05, then that should not tell the investigator that they have found an effect with significance. It tells one that, depending on other factors including effect size, economics, variation, one may be justified in running additional experiments.

Finally, I found a neat article, Statistics: P values are just the tip of the iceberg, which lays out a nice way to look at how data should be evaluated. The author's
"data pipeline" is something that I do with all my data. 


Collect the data --> clean the data --> subject that data to non-parametric exploratory data analysis --> evaluate various models and create statistical summary statistics (effect sizes, etc) --> then look at the p-values of any hypothesis tests you may want to run.

I suggest making a copy of the chart shown in the article and post it on your desk, next to your monitor, beside your coffee mug.

Here is another great article on the history and real intent of p-values and null hypothesis significance testing.
 

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.