A post on Neuroskeptic yesterday reminded me of one of my biggest pet peeves in reporting the results of statistical tests. And just to be clear, the point here is not to pick on Neuroskeptic (whose blog I enjoy quite a bit), as this pet peeve is very widespread.
The issue is this: reporting statistical test results with small p-values as ‘highly (statistically) significant’. There is, of course, the conceptually closely related (and equally annoying) practice of calling non(-statistically)-significant results with p-values between 0.05 and 0.10 ‘marginally (statistically) significant’. Note that, for the purposes of this rant, I’m taking it as given that we’re operating in the world of frequentist statistics and null-hypothesis significance testing.
Okay, so I’m a linguist, which is to say that I understand the distinction between descriptive and prescriptive rules. If I were treating this topic as a linguist interested in language use in the reporting of statistical results, I could (maybe) find something interesting to say about when, how, and why researchers interpret statistical test results as ordinal or kinda-sorta continuous despite the fact that such results are, in fact, dichotomous.
And, really, it’s probably not even that simple. I don’t recall ever reading that a test result was ‘just barely significant’ or ‘very non-significant’. Which is to say that my impression is that it’s much more common for reports of ‘highly’ and ‘marginally’ significant tests to be oriented toward confirming statistical significance of some sort. This result isn’t just significant, it’s super-duper, extra significant. That result isn’t actually significant, but it’s soooooo close, and it tried soooooo hard, let’s give it a cookie.
But I’m not treating this topic as a linguist interested in the language of statistical writing. I’m coming at it as a reader of social science research and a(n occasional) user and reporter of statistical tests. In science writing, prescriptive linguistic rules promoting precision and accuracy are appropriate. A test statistic either exceeds the (predetermined) criterion or it doesn’t, full stop. At best, saying that a test result is ‘highly significant’ or ‘marginally significant’ or that it ‘approached significance’ provides exactly no information of value above and beyond simply saying that the result was significant or not.
Plot and describe the data, the fitted model parameters, and the model’s predictions. Let the reader know how the data was collected and analyzed. Draw reasonable inferences. And if you’re going to use null hypothesis tests, say what the critical p-value is and then say whether or not each of your tests had a p-value above or below that critical value. Trust (or hope, maybe) that the reader can interpret the test results appropriately.
There are plenty of problems with p-values (see here, and here, and here, and here, for a start). Even so, they can be useful. But there’s no reason to imbue them with properties they simply do not have.