Reflections on Reasons for Reduced Rates of Replicability

Scott Alexander links to an interesting paper by Richard Kunert that aims to test two plausible explanations for the low replication rates in the Open Science Foundation’s project aimed at estimating the reproducibility of psychological science. The paper is short and open access, and I would argue that it’s worth reading, even if, as I’ll describe below, it’s flawed.

The logic of Kunert’s paper is as follows:

If psychological findings are highly context-dependent, and if variation in largely unknown contextual factors drives low rates of replication, then paper-internal (mostly conceptual) replications should correlate with independent replication success. The idea is that in conceptual replications (which constitute most paper-internal replications), contextual factors are intentionally varied, so effects that show up in repeated conceptual replications should be robust to the smaller amount of contextual variation found in independent, direct (i.e., not conceptual) replications. Kunert calls this the unknown moderator account. Crucially, Kunert argues that the unknown moderator account predicts that the studies with internal replications will be replicated more successfully than are studies without internal replications.

On the other hand, if low replication rates are driven by questionable research practices – optional stopping, the file drawer effect, post-hoc hypothesizing – then studies with internal replications will not be replicated more (or less) successfully than are studies without internal replications.

Kunert analyzes p values and effect sizes in the OSF replication studies. Here’s his Figure 1, illustrating that (a) there’s not much difference between studies with internal replications and those without, and (b) what difference there is points toward studies with internal replications having lower rates of replication (as measured by statistically significant findings in the independent replications, see left panel) and greater reductions in effect size (right panel):

p values (left), effect size reduction (right)
I think there’s a flaw in the reasoning behind the unknown moderator account, though. Specifically, I don’t think the unknown moderator account predicts a difference in replication rates between studies with and without internal replications.

The logic underlying the prediction is that if internal replications, then successful independent replications. But modus ponens does not license the conclusion that if no internal replications, then not successful independent replications. Studies without internal replications could lack internal replications for any number of reasons. In order for the unknown moderator account to predict a difference in independent replication rates between studies with and without internal replications, the absence of internal replications has to directly reflect less robust effects. Kunert doesn’t make a case for this, and it’s not clear to me what such a case would be or if it could be made.

So, the unknown moderator account is, I think, consistent with equal independent replication rates (on average) across studies with and without internal replications.

It’s possible, for example, that the unknown moderator account is true while all of the OSF studies probed (approximately) equally robust effects, with only a subset of them including internal replications. Or while some proportion of the findings from the OSF studies without internal replications are as robust as the findings from the studies with internal replications, while the rest are not.

The upshot is that the unknown moderator account predicts equal or greater independent replication rates for studies with internal replications than for those without. Given this, I think it’s noteworthy that Kunert reports lower replication rates and greater reductions in effect size for studies with internal replications than for those without. These effects aren’t statistically signif… er, don’t have sufficiently large Bayes’ factors or sufficiently shifted posterior distributions to license particularly strong conclusions, but they do both point in the direction that is inconsistent with even my modified version of the unknown moderator effect.

I think the unknown moderator account probably also predicts greater variation in independent replication rates for studies without internal replications than in those with. I’m not sure if this prediction holds or not, but based on Kunert’s Figure 1, it doesn’t seem likely.

It’s also worth remembering that modus ponens also implies that if not successful independent replications, then no internal replications. So the unknown moderator account also predicts that sufficiently low independent replication rates should correspond to studies without internal replications.

But it’s not at all clear “sufficiently low” means here. The replication rates in the OSF project that Kunert analyzed seem pretty low to me (and the whole point of Kunert’s analysis to test two explanations for such low rates), but I have no idea if they’re low enough to confirm this prediction.

And, of course, this logic is just as asymmetrical as the logic described above, since the presence of successful independent replications is consistent with the presence and the absence of internal replications. If even the low rates reported in the OSF project count as successful replication, then we can’t really infer much from approximately equal rates across these two categories of psychology study.

