In pursuit of Significance


Dear Statistical Curmudgeon:

I have run the paired sample t-test for each pair of sentence types. You suggested that I  also do the test by calculating female and male together, because it is more likely to give significant results.  As the table shows, I got a different results when I put Female and Male together (‘F&M together’) and when I split data by gender (‘F’, ‘M’).

My questions are:

1) Should I/ can I show both results, i.e. results of gender-mixed analysis and results of gender-separated analysis? Or should I choose just one of two analyses if I want to avoid ‘Bonferroni Correction’?

2)…

Yours Sincerely, Awash in Nunbers

Dear Awash,
Once you’ve done both analyses and are at the point of choosing, you can no longer discard one and avoid the Bonferroni correction.    The whole point of doing statistics is to let the reader know what’s the chance of your conclusion being wrong.   That’s really what a significance level means.  A report “at the 95% confidence level” means that you are 95% confident in your conclusions, so there is a 5% chance that they are wrong.

Now, if you have two analyses that give different results and you can pick and choose whichever one you want, how would you know which is the right one to choose?   You don’t!  How, then, can you tell your reader that you are 95% confident?  You can’t!  So, you really shouldn’t choose.

The correct option is to discuss both tests.  Discussing both would make it clear to your readers that your data is at the edge of statistical significance.  Unfortunately, it means your paper won’t be quite as simple, but life is often messy.  After all, there is no point in writing a simple, memorable conclusion if it is wrong.

Your other option is to do both of the tests with the Bonferroni correction (i.e. test at 2.5% confidence level and report at 5% confidence level).   Then, if one of the tests still rejects the null hypothesis, you can report it alone.  The idea behind a Bonferroni correction is that if you do many statistical tests on the same data, then choose whichever one gives a significant result, you will unwittingly have a much larger chance of falsely claiming that a result is statistically significant.  For instance, if you have two tests, each of which has a 5% chance of  a false significance, then there would be about a 10% chance that one or the other will be falsely report a statistically significant effect.  Obviously, it would mislead your readers if you do something that  has a 10% chance of error, and then write your paper as if you did one test with a 5% chance of error.

So, Bonferroni’s alternative is to do the tests, but compute each one more conservatively.  In our example, we might compute them to have a 2.5% chance of error, pick whichever test ends up significant, and then report it as if it were a single test with 5% chance of error.  It’s a bit of a weird approach, when you think of it, but it does the job.   The reader may not have a precise view of what you actually did, but he or she will correctly know the odds that your conclusion is wrong.

Note: Statistical tests should really be planned in advance, of course.   But this posting arose from a situation where a student came in for statistical repairs, after already collecting data and doing an analysis.   Also, doing statistics at a 1% confidence level is really much more sensible than 5%, in real research, but student projects are always limited in terms of time and data.

The best answer is not to get emotionally attached to statistical significance.    Remember: getting a significant result here is merely a rejection of the null hypothesis, and a lot of null hypotheses are simply made up in order to have something to test.  For instance, when doing research on the speed of snails and slugs, most people would start with:

H0: Snails are exactly as fast as slugs.

…but no one would seriously believe that to be true.   So, rejecting this hypothesis should not be considered a big accomplishment.

“Significance” can be enormously important if there is  a strong and important prediction and your results disagree with it.  But, absent such a prediction, “significance” is really just a way of expressing whether the difference between snails and slugs is big or small.  You should be equally happy to get either result.

…but, of course, few people are.  I don’t quite know who programs students to believe that significant = important, but whoever they are, I wish they’d stop…