Via boingboing, I found an article in the Chronicle of Higher Education on some (thus far failed) efforts to publish a failure to replicate the (in)famous findings “supporting” the existence of “psi” (i.e., ESP, or pre-cognition, or some such). I put “supporting” in scare quotes because, from what I’ve read, there are real methodological issues with the Bem study that started the most recent furor over this issue.
These are beside the point with respect to this blog post, and there is a lot of stuff online about it, so I’m not feeling inclined to retrace the steps I took so many months ago to come to this conclusion. The short version is that I remember being convinced that the sample sizes in the experiments reported by Bem were reasonably likely to be indicating that the data had been probed repeatedly and that the experiments had been stopped when statistically significant results were found. That is, it seems reasonably likely to me that the effects found by Bem are due, at least in part, to the statistical significance filter.
Okay, so the point of this blog post is to comment on a quote from the Chronicle article. Some researchers tried to replicate some of Bem’s findings, wrote up their negative results, and then said this in an email to the author of the Chronicle article: “Here’s the story: we sent the paper to the journal that Bem published his paper in, and they said ‘no, we don’t ever accept straight replication attempts’.” To the credit of the Chronicle writer, he points out the file drawer effect, quoting a New York Times piece about the Bem work:
And, of course, the unwritten rule that failed studies — the ones that find no effects — are far less likely to be published than positive ones. What are the odds, for instance, that the journal would have published Dr. Bem’s study if it had come to the ho-hum conclusion that ESP still does not exist?
It’s bothered me for some time that replication is not a regular occurrence in behavioral research, but it’s particularly distressing to see an anti-replication ethic stated so plainly. Don’t get me wrong – I am pretty sure I understand why replications are not a part of behavior research.
We could take a very cynical tack and say that replication isn’t common because behavioral effects are small, unreliable, and, more likely than we’d like to admit, statistical artifacts (i.e., false). Now, this may be the case some of the time, but my guess is that it is not the case all that often. I think replication is uncommon – and occasionally explicitly frowned on – because behavioral research is very difficult, and because, when we study behavior, we are studying extremely complex systems. There are a huge number of factors that can influence behavioral effects, many of which are unknown, ignored, or otherwise not taken into account when designing, conducting, and reporting behavioral studies.
There’s the old joke about how psychology is the science of college sophomores, but there’s an important insight in the humor. The characteristics of college sophomores differ from the characteristics of non-college-sophomores in any number of possibly relevant ways. Of course, this basic point applies, in principle, to just about any subset of the human species that might be available for your particular behavioral study. We simply don’t know what is the set of factors that we don’t know about that might be relevant in any given project. We have a serious case of meta-ignorance on top of our ignorance, with unknowns and unknown unknowns and all that.
I’ve probably overstated the case a bit, since it seems likely to me that lots of behavioral effects would, in fact, be fairly easy to replicate, if there were any incentive for people to carry out replications. The difficulty of doing a good replication may well account for a cultural disinclination to do replications, since a failure to replicate a behavioral finding could be due, of course, to the original finding being bunk, but it could also be due to any number of unknown differences between the original study and the replication, differences that no one thought to consider in the first (or second) place. A long term cultural disinclination could then lead to explicit policies to reject straight replications without review. Sad, but understandable (and also maybe wrong – the cynical view, or some other view I haven’t brought up here, could provide a better explanation, after all).
So, what’s to be done? I’ve thought about this some, but the best I’ve come up with so far is to assign, when and where appropriate, replications as term projects. At the beginning of graduate school, I found it frustrating to be expected to formulate reasonable research questions, design appropriate experiments, and collect and analyze data while simultaneously learning the content of a course. When you only know what’s been covered in the first few weeks of a class, it limits the range of possible term project topics pretty severely. As a teacher, I’ve thought about this from the other side, which is to say that I’ve pondered at length the organization of a course’s content and how it influences students’ abilities to do good, and interesting, term projects. Of course, my experiences with this are tied very closely to linguistics and second language studies. My indirect experience with psychology (i.e., what I heard from the other lab members when I was a linguistics and cog-sci student in a psych lab) indicates that research projects are less tied to ‘content courses’.
Anyway, there’s an obvious educational benefit to assigning replications as term projects, since students are likely to learn quite a bit from trying to replicate what someone with a lot more research experience did. The scientific value of this is, um, let’s call it less obvious. Less experience doing research leaves a lot of extra room for unknown but possibly relevant differences to creep in and mess things up.
There’s also the scientifically more promising psychfiledrawer.org, which seems to be a good faith (and well-funded) effort to make replication attempts available to behavioral researchers. Just because journals won’t publish replications doesn’t mean they shouldn’t be published.
(A quick, and amusing, aside: When looking for links about the Bem study, I found a page on Bem’s website from 1994 (1994!) that says “Recent laboratory research suggests that parapsychologists might finally have cornered their elusive quarry: Reproducible evidence for psychic functioning.” Or maybe not.)