When is a parameter not a parameter?

At Mid-Phon 17, I was talking to a colleague about mixed-effects (i.e., multilevel) modeling, and he stated very matter-of-fact-ly that when a random-intercepts-by-subjects model is estimated, there are no intercepts estimated for each individual subject. Rather, he continued, the model specifies a probability distribution over the subject-specific intercepts, so you get an estimate of the variance of the intercepts (with, though it wasn’t mentioned explicitly during this conversation, the group-level intercept providing the mean of the distribution).

I didn’t know how to respond, since this seemed obviously wrong to me. It seemed wrong to me because it is wrong. The next talk was starting, though, so there was but a brief moment of awkward silence as I tried to process these assertions prior to returning to my seat.

The topic didn’t come up again at the meeting, though shortly thereafter I talked to a (different) colleague (and friend) of mine who is knowledgable about such models. He immediately asked if the first colleague uses SPSS. I didn’t know the answer, but it’s certainly possible, even plausible. Colleague #2 asked because, apparently, SPSS makes it exceptionally difficult (maybe impossible) to see the estimated subject-specific (or item-specific, or whatever-grouping-variable-specific) parameters in a fitted model.

So maybe SPSS users have a strange, and limited, view of how random effects work, I thought. Until today.

Today, I was finishing up reading Florian Jaeger’s 2008 paper (pdf) arguing that mixed-effects logistic regression models are superior to ANOVA when doing categorical data analysis. I came to more or less the same conclusion sometime around 2008, though without reading Jaeger’s paper. I say more or less the same conclusion because while I agree with Jaeger that (the traditional conception of) ANOVA isn’t appropriate for categorical data analysis, I think that logistic regression is just one of a number of more appropriate models (GRT being another such model, at least for certain cases).

In any case, I finally got around to starting Jaeger’s paper recently, and I picked it back up today only to find this quote (p. 443):

The only parameter the model fits for the random effects is their variance (see also Baayen et al., 2008; for details on the implementation, see Bates & Sarkar, 2007).

Jaeger is discussing these models in the context of R and lme4, so I am kind of flabbergasted at this assertion. I’m flabbergasted in part because when you fit a multilevel regression model (logistic or otherwise) using lmer, the function \rm ranef() with the lmer output object as the argument returns the estimated random effects parameters. In fact, I just happened to have used \rm ranef() to make a figure illustrating the distribution of random item intercepts in a mixed-effects logistic regression model a few days ago (for one of my ICA/ASA/CAA proceedings papers):

item_intercepts

Since Jaeger cites Baayen et al. (pdf) in support of this assertion, I figured I’d see if I could trace the problem back to that. I haven’t read the paper in full yet, but I found this (pp. 393-394):

When a mixed-effects model is fitted to a data set, its set of estimated parameters includes the coefficients for the fixed effects on the one hand, and the standard deviations and correlations for the random effects on the other hand. The individual values of the adjustments made to intercepts and slopes are calculated once the random-effects parameters have been estimated. Formally, these adjustments, referenced as Best Linear Unbiased Predictors (or BLUPs), are not parameters of the model.

I don’t know much about BLUPs, so maybe I’m way off-base to say that it strikes me as rather silly to treat estimated random effects as substantively distinct from parameters. They’re certainly not data; we haven’t, indeed we can’t in principle, observe them. And they enter into the equation from which estimated predicted dependent variables are calculated, and in, in every relevant respect that I can think of, in exactly the same way that the ‘fixed effects’ parameters do. In addition, statistical bias is a property of a model’s parameter(s), so to call the random effects unbiased non-parameters seems paradoxical. All of which makes them sound an awful lot like parameters to me. The fact that the variance and covariance parameters governing the random effects are estimated prior to estimating the random effects themselves is neither here nor there with respect to what we call the latter.

Indeed, it seems very odd to me that the variance and covariance parameters are estimated first, since my intuition is that there would need to be something varying and covarying in order to estimate these. I come at all of this from a Bayesian perspective, by which I mean that the first few multilevel models I fit to data, I built and estimated using BUGS and JAGS. In these cases, I don’t see any way that the ‘random effects’ could not be parameters while the means, variances, and covariances governing them are. These bits aren’t data, but they are part of the model equation – without them, you can’t calculate the likelihood function (see, e.g., the ‘theta construction’ section of this model or the role that the Bsp and Bsf arrays play in this model).

So, okay, it’s still possible that SPSS users have a weird, limited understanding of mixed-effects modeling. But there’s something else going on, too, and it’s not entirely clear to me what it is. I assume that the D. M. Bates who is the third author on the Baayen et al. paper is the same D. Bates that co-developed lme4, so I’m perfectly willing to grant that, with respect to (penalized maximum likelihood) estimation, the random effects, on the one hand, and the fixed effects and variances/covariances governing the random effects, on the other, are treated differently. But it seems confusing and confused, at best, to insist that the former are not parameters.

 

Posted in statistical modeling | 2 Comments

Induction redux

I was amused to come across the toupée fallacy recently. According to the RationalWiki page on it, the idea is that it’s (informally) fallacious to assert that “all toupées look fake” if I am basing that conclusion solely on the “evidence” that, thus far, I have only ever seen fake-looking toupées. The problem should be clear – I haven’t noticed any non-fake-looking toupées because they’re not fake-looking, so I thought they were just hair!

The toupée fallacy seems to me to be the fallacy of hasty generalization, and it’s not clear why hasty generalization isn’t referenced on the RationalWiki page. Perhaps the folks over there are hoping the snazzier “toupée fallacy” moniker will catch on, but they’re up against some stiff competition already. The (Irrational?) Wikipedia article lists a few, none of which refer to hairpieces:

The fallacy is also known as the fallacy of insufficient statistics, fallacy of insufficient sample, generalization from the particular, leaping to a conclusion, hasty induction, law of small numbers, unrepresentative sample, and secundum quid. When referring to a generalization made from a single example it has been called the fallacy of the lonely fact or the proof by example fallacy.

I am partial to “the fallacy of the lonely fact.”

As noted in both wiki articles, hasty generalization is part of the more general problem of  induction. I’ve written about induction before, at which time I brought up the distinction between the plebian and aristocratic problems of induction. The plebian problem is stated (by Larry Laudan) as:

Given a universal empirical generalization and a certain number of positive instances of it, to what degree do the latter constitute evidence for the warranted assertion of the former?

And the aristocratic problem is stated as:

Given a theory, and a certain number of confirming instances of it, to what degree do the latter constitute evidence for the warranted assertion of the former?

As I noted in my previous post, Laudan states (in a footnote) that “a ‘theory’ in this sense must postulate one or more unobservable entities, i.e., statements which could arise as empirical generalizations do not count as theories for these purposes.” He illustrates the aristocratic problem of induction with a discussion of whether or not pressure-volume relations in gasses count as evidence for a particle-based theory of gasses. The aristocratic problem is that, even if we observe a pressure-volume relationship that is predicted by our particle-based theory, because other theories might make the same prediction, too, it doesn’t provide use with evidence for our theory. Jaynes describes it as “plausible reasoning,” but strictly speaking, the premises “If P, then Q” and “Q” license nothing more than “Maybe P, maybe not P.”

Both of the wiki articles about hasty generalization are concerned only with plebian induction, and neither consider probabilistic relationships.

Let \rm H_1 be the hypothesis “all toupées look fake.” Considering \rm H_1, we’re faced with the classic, non-probabilistic plebian induction problem. We can’t ever completely confirm \rm H_1 unless we can observe every toupée. And all it takes is one non-fake toupée to disconfirm \rm H_1Any generalization from positive instances is hasty, and any observation of a non-fake-looking toupée falsifies \rm H_1.

Let \rm H_2 be “we expect k out of any N toupées to be fake-looking.” Again, we could (dis)confirm \rm H_2 by observing all toupées, but that’s not a realistic possibility. So, suppose we test \rm H_2 by looking at a sample of toupées, in which m are fake-looking.

Under many – maybe most – of the statistical probes of \rm H_2no observation provides an unambiguous answer. Observing m fake-looking toupées will be more or less consistent with our hypothesis, and we will need some extra machinery to make any decisions with respect to \rm H_2.

Of course, we could use good ol’ statistical hypothesis testing, in which case we can maybe get an unambiguous answer that depends, in part, on the observed value of m, the hypothesized value of k, the size of our sample, and any number of choices regarding statistical test procedures (e.g., the \alpha level, the test statistic, etc…).  And to the extent that we do get an unambiguous answer, it will be “reject \rm H_2,” and it will be based on the fact that out test statistic exceeded some more or less arbitrary criterion.

But \rm H_1 and \rm H_2 aren’t theories. Suppose we develop a theory that posits unobserved entities, and suppose further that this theory implies that, among other things, (some proportion of) toupées are fake-looking. The aristocratic problem tells that no number of observed fake-looking toupées licenses the conclusion that our theory is true.

As discussed at length by Laudan, it’s this kind of (aristocratic) problem with induction that led to the dominance of hypothetico-deductive methods in science. I’ve been reading some papers recently (e.g., Gelman & Shalizi; Box’s chapter in this book) that present a sort of cyclic picture of science, with induction (or induction-like) reasoning (e.g., statistical model building) alternating with hypothetico-deductive reasoning (e.g., statistical testing, model fit evaluation).

This picture seems more or less correct to me, in the sense that it accurately describes the kind of things (social?) scientists do, at least some of the time (I’ll leave open the question of whether or not this is what scientists should do). But to the extent that the model-building phase in this picture of (statistically-oriented) science is based on induction from observation, it doesn’t make direct contact with any theory (in the unobserved-entity sense of theory used above).

I think cognitive science writ large can still benefit substantially from more or less atheoretical inductive model building. But one of the most difficult – and important – parts of conducting good science is figuring out exactly how theory makes contact with observation. Theories of, e.g., speech perception pretty much never make predictions about empirical phenomena as directly and unambiguously as particle-based theories of gas do about pressure-volume relationships, but it seems exceedingly unlikely to me that we’ll ever have anything like a complete understanding of speech perception without detailed, well-tested, unobserved-entity-positing theories.

Posted in philosophy of science, SCIENCE!, statistical modeling | 2 Comments

Argument by analogy

I’ve been slowly re-reading Larry Laudan’s Beyond Positivism and Relativism, and in a later chapter there’s an interesting section dealing with ambiguity and the strength of a philosophical position. The position in question is David Bloor’s sociological program for providing causal explanations for scientific beliefs, specifically the assertion – the symmetry thesis – that irrational and rational beliefs should be given the same type of causal explanation.

Laudan points out that it is not at all clear what properties two explanations need to have to count as the same type. On the one hand, under the very broad, though not unreasonable, construal that explanations relying only on natural causes are of the same type, this assertion has no teeth – rational beliefs can be caused by careful data collection and thorough ratiocination, while irrational beliefs can be caused by, say, the uncritical acceptance of the pronouncements of an authority figure.

Under more narrow construals, on the other hand, the symmetry thesis is clearly much stronger, in the sense that it puts substantive constraints on the causal explanation of belief. As it happens, I’m convinced by Laudan’s argument that strong versions of the symmetry thesis are wrong, but that’s neither here nor there.

I had the ambiguity of the symmetry thesis fresh in my mind when I saw this philosoraptor meme response to Wayne LaPierre’s invocation of violent video games (e.g., on page three of the NRA response to the Newtown shootings) as a cause of real-world violence:

philosoraptorUnder one construal of the “guns don’t kill people, people kill people” argument, this is a legitimate, forceful rebuttal. Specifically, if the guns-don’t argument turns exclusively on the agency of the killer, then it doesn’t, in fact, make any sense to blame video game violence for real-world violence, since, as the meme says, the gamer is just as much the agent as is the gunman. If blame for violence can only be assigned to an agent carrying out that violence, then guns and games are, as asserted, equivalent (and equivalently free of blame).

However, if the guns-don’t argument turns on the specific role of the gun in addition to the role of the agent, then it’s pretty clear that games and guns aren’t equivalent. If an agent has decided to act violently, a gun certainly makes it easier to kill, but the mere presence of a gun doesn’t, as far as I can tell, make someone more likely to decide to act violently.

Now, I don’t happen to believe that violent video games make people more likely to act violently, either, and it’s obviously true that playing violent video games wouldn’t make it easier to kill once a decision to be violent has been made. But it’s at least theoretically possible that repeatedly realistically simulating shooting people could inure a gamer to violence directed toward actual people.

To be clear, I think the guns-don’t argument is exclusively about agency, so I don’t think the different functions of guns and videogames blunts the point made by the philosoraptor meme.

Having thought through all this, I saw another meme-based gamer response to the NRA’s demonization of videogames, one which circumvents the aforementioned potential problems with the philosoraptor meme:

target

Now, you could make the case that it’s the realism of videogame violence that gives them their malignancy, but videogames aren’t that realistic, and shooting an actual gun is pretty psychologically powerful, in my experience, even if you’re just shooting at clay pigeons, cans of generic soda, or an ancient fax machine/printer the size of R2-D2 (long story).

There are other potentially relevant differences between target shooting and videogame violence, of course, and pinning (some) blame on videogames is only a small part of the NRA statement. A generous interpretation would give LaPierre credit for recognizing the complexity of the many causes underlying awful acts of violence. A less generous interpretation would say that LaPierre is tossing out canards to distract from the real issue(s).

It won’t surprise you to hear that I don’t know what the solution is, though I think the problems with our ‘gun culture’ have more to do with culture than with guns, per se. There are numerous countries with very high gun ownership rates and very little gun-related violence. This article makes the point fairly well, I think.

I’ll end with a quote from that article that, I think, helps us understand a bit about why mass shootings happen more in some countries than others, but also makes it clear why it won’t be at all easy to change this fact, since it seems very likely to me that “informal social controls” can’t be imposed from above, emerging as they do over the course of a culture’s evolution:

Many other countries where gun ownership is high, such as Norway, Finland, Switzerland and Israel, however, tend to have more tight-knit societies where a strong social bond supports people through crises, and mass killings are fewer, Squires said….

“What stops crime above all is informal social controls,” he says. “Close-knit societies where people are supported, where their mood swings are appreciated, where if someone starts to go off the rails it’s noted, where you tend to intervene, where there’s more support.”

Posted in mildly informative filler | Comments Off

Breaking: Elegance still dead

I wrote that elegance has no epistemic virtues, citing the apparent failure of supersymmetry and my wish to see an end to the invocation of elegance as a dimension along which (linguistic) theories should, or even can, be evaluated.

Josh responded with a vote of confidence for elegance in syntax. In doing so, Josh says that I am ”hoping that ‘elegance’ will go away as a desideratum of Linguistic theory. Or, maybe not go away, but be relegated to lesser importance. Or … actually, it’s a little hard to tell exactly what he [Noah] means – which is always the case when you’re essentially arguing for a re-weighting of things, but won’t specify what weighting you’re after.”

I thought it was clear what I meant when I wrote that “I don’t see how elegance can possibly matter when evaluating a theory’s claim to constitute knowledge” and that “I hope that the (apparent) failure of the undoubtedly very elegant theory of Supersymmetry will make it more obvious to physicists and non-physicists alike that elegance isn’t scientifically valuable.”

So, just to clarify, to the extent that I’m arguing for a re-weighting of evaluative dimensions, I want the dimension of elegance to have zero weight. I want elegance to go away as a desideratum in scientific theory evaluation. And I’m unconvinced by Josh’s argument to the contrary.

Josh and I both agree that notions like simplicity are, in principle, useful when comparing two theories all else being equal. Josh describes the general approach outlined by the authors of Understanding Minimalism (Hornstein, et al.) this way:

First you look at the data and come up with a theory that explains the facts. Once you’re broadly satisfied with your theory, you refine it and hope that the process of refinement increases your understanding.“Explanatoriness” – which Noah correctly asserts is epistemologically prior to the others – must be satisfied before you get to the other criteria, which are really criteria for refining a theory. Elegance, parsimony, etc. are parasitic on having first explained something. I really don’t see any ground for objection: that seems to me to be exactly the right way to look at these things.

To further clarify my position, I believe that there are a number of evaluative dimensions that are epistemologically prior to dimensions like simplicity. I mentioned a few in my first post on this issue: explanatory scope, predictive accuracy, consistency with ‘neighboring’ theories, and the generation of surprising predictions. There are almost certainly others. But even if we restrict ourselves to Hornstein, et al.’s explanitoriness, the argument for elegance falls flat.

Josh is correct to point out that I’m concerned that all else won’t be equal. It’s plausible to me that P-and-P and Minimalism differ with respect to one or more of these dimensions. If they do, this would preclude the need to invoke elegance in comparing the two approaches. But where I’m just concerned that all else won’t be equal (and, frankly, I don’t know enough about the relevant syntactic facts or the two theories to be anything more than concerned), Josh explicitly states that all else is not equal, writing that:

[T]he authors [of Understanding Minimalism] believe that the Principles and Parameters Theory is in a stage of development where things have been more or less adequately explained, and we’re now ready to expand our understanding of the explanation by paring down the theory. Interestingly, I don’t agree with that assessment. I think Minimalism is actually the approach that should have been taken from the get-go, and that in addition to being a better theory across the latter-day criteria, it’s also much more explanatory than Government and Binding ever was. Government and Binding made the wrong generalizations, and precisely because in those days people were simply stating observations gussied up in theoretical jargon. Social sciences should start with Minimalist-style approaches, actually. And in any case, the empirical coverage of Government and Binding Theory was actually pretty poor, and there is no reason to believe that we were in a stage where the existing theory was ready for wholesale refinement on that basis.

I’m willing to defer to Josh with respect to the relative explanatory scope of the two theories. But even if I agreed with Hornstein, et al., that P-and-P satisfactorily explained the relevant syntactic facts, it’s not obvious to me that, per Josh’s description of exactly the right way to build syntactic theories, refinements along the dimensions of simplicity, elegance, and naturalness will leave the epistemologically prior desiderata satisfied. Josh says that Minimalism is a better explanatory theory, but it’s easy to imagine that simplifying a good explanatory theory could reduce that theory’s explanatory scope, its predictive accuracy, its consistency with other theories, etc…, which, to belabor the point, obviates the need to invoke elegance or simplicity.

Finally, even if P-and-P and Minimalism are on equal primary epistemic footing, and even if refining P-and-P to build Minimalism hasn’t changed this fact, it’s still not clear to me that elegance can do any epistemic work for us, because it’s still not clear to me just what elegance is. Josh takes a stab at defining it:

I see people use “elegant” mostly in situations where two/multiple things which there is no reason to believe are connected can be made to seem or shown to be connected in a convincing way. Call it the “two birds with one stone” principle. So, an “elegant” move in Chess is usually when someone is facing two unconnected lines of attack, and to the casual observer it looks like dealing with one will expose a weakness in dealing with the other, and the player nevertheless manages to make a single move that puts him in a position of strength to deal with both lines simultaneously. An “elegant” solution to a math problem is similar. We’re faced with something that looks complicated, and it looks like we’ll have to go through many steps to simplify one part of the problem before dealing with another. An “elegant” solution in that case manages to simplify both parts of the problem simultaneously. An elegant solution to a crime novel is when a series of events which seem unrelated and are in any case confusing are shown not only to be connected, but in a simple way. “Elegance” usually plays on the unexpected, seeing things that are not immediately obvious, and in a way that connects things that did not seem connected before.

To the extent people share that impression of what “elegance” is, it is obvious what it has to do with “a theory’s claim to constitute knowledge.” In fact, it is central to explanation. If two (or more) things which seem to be unrelated and lack explanations can be shown to be related in a convincing way, then knowledge has been advanced. Of course, this is, as mentioned above, parasitic on “explanatoriness.” If the theory is elegant but fails to correspond to reality, then it’s worthless. “Elegance” does nothing to make a useless theory useful. But this is why the authors insisted that different desiderata acquire different levels of importance at different stages of theory development. You must absolutely start with a theory that explains things, and elegance, parsimony, simplicity and naturalness be damned. Once you have a basic explanation, you refine that explanation by trying to make it more elegant, parsimonious, simple and natural. To the extent that elegance finds unexpected connections between parts of the theory, it cannot fail to advance knowledge.

If multiple phenomena lack explanations at time t and an elegant theory explains them and convinces us that they’re connected at time t + 1, it’s not the elegance of the theory that’s doing the work, it’s the explanation.

In Josh’s chess example, it’s the effectiveness of the move that matters to the outcome of the game, not it’s elegance or beauty. In the crime novel example, elegance is a mixture of explanation and simplicity. And in the math example, it seems to be functioning entirely as a synonym of simplicity.

As I wrote in my earlier post, I don’t doubt that elegance is pleasing to the beholder. But if we’ve already got all the explanitoriness we want – and that’s a big if, as discussed above - elegance just isn’t giving us anything scientifically valuable.

Posted in philosophy of science, SCIENCE! | 6 Comments

Optimality theory

No, not that kind of optimality theory. Though maybe I’ll write about that kind some day.

No, this is about the kind of optimality theory wherein you have a mathematical model of perception (or memory, or whatever) and response selection, and you model the response selection bit as optimal, given the nature of the perception (or whatever).

Oh, and you’re fixated on showing (mathematically, and so theoretically) if, and if so how, the nature of the perception (or whatever) and the assumption of optimality in response selection can join forces to produce a particularly esoteric relational property in one of the model components.

What am I talking about? Read this, if you really want to know.

Posted in mildly informative filler, statistical modeling | Comments Off

The end of elegance

It’s a long standing joke that social scientists have Physics Envy. To the extent that it’s not a joke, I’m tentatively hopeful that some recent developments in physics will have a ripple effect in fields like linguistics.

Or perhaps I should say “some recent lack of developments in physics,” since I’m talking about the fact that it looks like Supersymmetry is not getting any support from the experiments conducted at the Large Hadron Collider.

I’ve read in multiple places that one of the marks alleged to provide support for Supersymmetry is (was) its elegance (e.g., the Scientific American article linked above). I’ve also seen elegance trotted out as a desideratum of syntactic theory. For example, the second paragraph on page 5 of Understanding Minimalism begins “As in any other domain of scientific inquiry, proposals in linguistics are evaluated along several dimensions: naturalness, parsimony, simplicity, elegance, explanitoriness, etc.”

I’m in complete agreeance that scientific theory evaluation is multidimensional, though I don’t know why the Minimalism Understanders chose this particular order when listing these evaluative dimensions. The order is certainly not determined by each dimension’s relative epistemological importance. The next sentence in the paragraph makes this clear, but even without that next sentence, a little careful thought suffices to show none other than explanitoriness (or, perhaps more, erm, elegantly, explanatory scope) even have any epistemological importance.

It’s not clear to me what naturalness could even mean here, so it’s hard for me to see how one could make the case for a more natural theory being preferable to a less natural theory.

Putting aside the redundancy of listing both parsimony and simplicity (which seem to me to be used interchangeably when used as theory descriptors), simplicity only has comparative and pragmatic value. That is, while simplicity can be desirable to the extent that it makes a theory easier to work with (i.e., while simplicity can have pragmatic value), it can only help one theory win out over another if both theories are equally satisfactory with respect to epistemologically important desiderata – qualities like explanatory scope.

My problem with elegance is more or less the same as my problem with simplicity, with maybe a dash of my problem with naturalness thrown in for good measure. Even if we can come up with an acceptable definition of elegance (which I’m not at all sure we can, and I don’t think the “I know it when I see it” criterion will work here), I don’t doubt that a more elegant theory is more aesthetically pleasing than a less elegant theory. But I don’t see how elegance can possibly matter when evaluating a theory’s claim to constitute knowledge.

I may be wrong, but it seems to me that more elegant theories don’t have wider explanatory scope, they don’t generate more accurate predictions, they aren’t more consistent with other theories in the same or neighboring domains, they don’t generate a greater number of surprising predictions. Maybe they are more internally consistent, but even if they are, it’s the internal consistency that matters, not the elegance. Pick an epistemologically important criterion on which to evaluate theories; I bet you’ve picked a criterion that has little if anything to do with elegance.

Maybe the structure of the natural world is, in fact, elegant, but it seems just as a priori plausible to me that it’s not. And inelegance seems not just plausible but likely when we’re talking about the products of evolution, which has a persistent and well documented tendency to cobble together kludgy solutions to historical contingencies, many of which (the solutions) have any number of cascading effects on this or that other system in an evolving organism. Whether language developed under pressure from natural selection or not, it’s not clear to me why we should expect linguistic theories to be elegant.

To the extent that physics sets the standard for conducting good science, I hope that the (apparent) failure of the undoubtedly very elegant theory of Supersymmetry will make it more obvious to physicists and non-physicists alike that elegance isn’t scientifically valuable. Give it 10 or 20 years, and maybe this idea will even work its way into the social sciences.

Posted in philosophy of science, SCIENCE! | 4 Comments

From blog to manuscript

In revising this paper, I incorporated a fair amount of this post in the General Discussion. I edited the text to match the tone of the paper, which, being a paper I hope to publish, was written for the more formal (than a blog) setting of an academic journal.

While a number of my previous posts have been useful exercises in thinking through various problems I’m working on, no others have played such a directly, concretely useful role in actually getting something done. I should write more posts like that, I suppose. Then I can blog more, since I will be able to write off any such time spent as professionally worthwhile.

And since we’re here, I want to take this opportunity to commend the Association for Laboratory Phonology for having a very simple, very user-friendly paper submission process. Very few fields to fill in, very few files to upload, and the paper is in the system and ready to get torn apart in any number of exciting, new, impossible-to-anticipate ways (barring any pre-review emails from the journal folks explaining how deeply offensive my header formatting choices are, that is).

I suppose it’s not 100% kosher to announce on my blog that I submitted this paper to that journal, but given that this other very closely related paper was just published and is cited in the LabPhon paper, and given how few people are doing this kind of modeling work in speech perception, I feel fairly confident that I don’t have much anonymity left to (fail to) maintain.

Immediate Addendum: I didn’t follow a number of the author guidelines for this particular journal, so, although my header formatting is close to the mark, I fully expect to get an email explaining how deeply offensive my line spacing, footnote placement, and (lack of) anonymity are. Such is life.

Posted in mildly informative filler | Comments Off

Wind River Range, Part 4: The Middle of Somewhere

In September of last year, I started writing about a 1993 hiking trip through the Wind River Range of Wyoming. I scanned a bunch of photos from the trip, and I wanted to share them and write about the experience and the process of trying to remember the experience.

And then I hit a wall. The faded memories were getting jumbled up, and the map of the area was as confusing as it was enlightening. I stopped writing about my trip through Wyoming and started writing about my trip through the mountains in southern Chile (1, 2, 3, 4), a trip to Rio de Janeiro, and, naturally, statistical modeling (e.g.).

Well, the memories are no less jumbled, but I am returning to the Wyoming trip in order to share some more pictures. I left off last time talking about (what I think was) Spider Peak and Mile Long Lake. Here’s a screen grab of a google maps close up of the area:

Spider Peak, Mile Long Lake, and environs

I have very clear memories of the long hike up to the plateau to the west of Mile Long Lake. The ground was rocky and loose, so every step up was accompanied by a half-step-length slide back down. We had been at it for a couple weeks by this time, so we were all (getting) in pretty good hiking shape, but a 60-80 pound pack on that kind of steep, sloppy terrain is still pretty rough.

At this point on the map, I just can’t reconstruct where we went in any detail. I know that, eventually, we made it to Square Top Mountain, which you can see on this map is about 10 miles to southwest of Mile Long Lake. And I know that somewhere in between Mile Long Lake and Square Top Mountain, we met up with a cowboy who had horsed in a mess of resupply food and stove fuel.

Speaking of which, here’s a picture of the resupply location:

It seemed worth taking a picture of at the time, apparently.

And here is, I think, a picture I took after the climb up and away from Mile Long Lake:

It’s probably fairly obvious why I took this picture.

Truth be told, though, I’m having a hard time aligning this with the map of the area. So, you know, maybe it’s some other strikingly beautiful cloud-filled valley.

Here’s a picture of (again, I think) Gannett Peak:

Possibly Gannett Peak, the highest point in Wyoming, which is to say that Gannett Peak is definitely the highest point in Wyoming but that this may or may not be a picture of same.

The trouble here is that, while I remember seeing Gannett Peak during this trip, it’s pretty far south of where I think we went (see here again). In any case, it’s a nice looking mountain, all pointy and snowy like that.

Finally, here are two pictures in one or both of which, if I’m not (repeatedly) mistaken, the Tetons are visible way off in the distance (note that the Tetons are a whole nother mountain range, pretty far northeast of the Wind River Range, so the fact that these pictures follow the Maybe Gannett Peak picture is neither here nor there, so to speak):

This picture may include the Tetons.
This picture, too.

And since memories are funny like this, I’ll finish this post with a short and kind of disgusting story that that last photo reminded me of.

The backpack and ice axe visible in that last picture belonged to a fellow student named Sean, who was from Oregon. Now, as you may or may not know, but as you can probably guess, you end up talking about all kinds of ridiculous stuff on a trip like this. Well, one day, Sean regaled us with a tale of a climb he did up Mt. Hood. Also as you may or may not know, but as you can probably guess, climbing Mt. Hood requires plenty of warm clothing as well as plenty of climbing gear. And as is almost certainly obvious to you, both warm clothing and climbing gear can make relieving oneself a more involved process than may be ideal.

You can probably see where this is going, but I’ll go ahead and spell it out. This fellow Sean’s call to nature was thwarted by his warm clothing and climbing gear and so, despite his best efforts, he pooped in his pants while climbing Mt. Hood. He may have even been acting as a guide to other climbers at the time, and I can’t remember if it was on the way up or the way down the mountain, but that kind of detail hardly matters, I think.

Okay, so I don’t want to leave you thinking about a poopy stranger, so here’s a nice picture that I have no hope of ever localizing beyond “somewhere in the Wind River Mountain Range”:

Don’t go chasing waterfalls.
Posted in mildly informative filler, recollection | Comments Off

Generality vs. specificity and the logic of proof

A couple weeks ago while revising a paper (now published), I got thinking about the interplay between generality and specificity in mathematical proofs. A more general proof is better than a more specific proof, all else equal, since the former covers more ground and relies on fewer and/or weaker assumptions. In general, you only want to be as specific as you need to be to make your case.

As described in my last post (and in earlier posts), the issue at the heart of this paper is the identifiability (or lack thereof) of failure of decisional separability in general recognition theory. As discussed in the earlier posts and the papers linked therein, GRT is a model of perception and response selection in which it is assumed the stimuli produce random perceptual effects in a perceptual space that is exhaustively partitioned into response regions by decision bounds. The probability of a particular response to a particular stimulus is the multiple integral of the perceptual distribution associated with that stimulus over the appropriate response region.

Decisional separability holds on a given dimension if the relevant decision bound is parallel to the appropriate coordinate axis; otherwise it fails. Here’s a bird’s eye view of a Gaussian GRT model with failure of decisional separability:

Linear failure of decisional separability

And here’s the same model after a couple linear transformations that impose decisional separability:

… rotated and sheared.

Robin Thomas (my coauthor) and I wrote a proof of the proposition that any 2×2 Gaussian GRT model with linear bounds and failure of decisional separability can be transformed into an empirically equivalent model with decisional separability. Which is to say, in part, and with respect to the focus of this post, that we made some strong, and very specific, assumptions, namely that the perceptual distributions are bivariate normal and that there are four such distributions corresponding to the factorial combination of two levels on each of two dimensions (e.g., in a visual perception experiment, red squares, purple squares, red rectangles, purple rectangles).

We could (and maybe should) have written a more general proof. The result certainly generalizes to models with more dimensions and with levels on each dimension (as long as the decision bounds on each dimension are parallel). With some work, it would also generalize to other distributional assumptions.

Now, the 2×2 Gaussian model (and associated experimental protocol) is the most frequently used, so it makes practical sense to focus on this case. Restricting ourselves to the 2×2 structure also keeps things relatively simple (as simple as a parametric GRT model gets, anyway), and the Gaussian assumption allows us to make use of the fact that linearly transformed Gaussian random variables are still Gaussian.

So, there are benefits to making assumptions that restrict how widely applicable our proof is. With these assumptions, we were able to show exactly what the means and covariance matrices of the post-transformation perceptual distributions are. And we were able to provide clear and compelling illustrations of the transformations, which would be impossible without making some kind of specific distributional assumptions.

Note, too, that this is a proof of a positive assertion. Given the stated assumptions, we proved it you can always transform a model without decisional separability into an empirically equivalent model with decisional separability. One of the reviewers suggested that perhaps it is also true that any model without perceptual separability could be analogously transformed into a model with perceptual separability.

Perceptual separability holds on a particular dimension if the marginal distributions on that dimension are identical across levels of the other dimension. Perceptual separability is, conveniently, illustrated for both dimensions in the first figure above. The +s indicate the distributions means. The means form a square in this figure, and the marginal variances are all equal, so, for example, the marginal perceptual distributions for the lower two distributions (LL and HL) on the horizontal dimension are identical to the marginal distributions for the upper two distributions (LH and HH) on the horizontal dimension.

Failure of perceptual separability is conveniently illustrated in the second figure. The rotation and shear transformations have shifted the distributions means so that they form a parallelogram with sides that are not parallel to the coordinate axes, but they’ve also changed the marginal variances differently for the LL and HH distributions, on the one hand, and the LH and HL distributions, on the other.

On reading the reviews, I wasn’t sure whether or not models without perceptual separability could, in general, be transformed to impose perceptual separability. My intuition was that they couldn’t be. When I set out to figure it out one way or the other, I revisited the figures above (and the associated decisional separability proof), and I started thinking through what all would be required to ensure the post-transformation presence or absence of perceptual separability.

For a given pre-transformation covariance matrix \boldsymbol{\Sigma} and linear transformation \bold{L}, the post-transformation covariance matrix is \bold{L}\boldsymbol{\Sigma}\bold{L^t}. If the off-diagonal elements of \bold{L} are non-zero (as they are with a rotation, and as one is with a shear), then the pre-transformation covariances will play a role in determining the post-transformation marginal variances. So, when two distributions have different correlations before a linear transformation (e.g., the LL and LH distributions in the first figure above), they will have different marginal variances after the transformation.

I was thinking through how to use this fact to prove that perceptual separability cannot be imposed on an arbitrary 2×2 Gaussian GRT model when it occurred to me that a much more general proof was available. I realized that I could prove what I wanted to prove without making any distributional assumptions or mentioning covariance matrices at all (though I would still rely on the 2×2 assumption).

More specifically, it occurred to me that if two opposite sides of the quadrilateral described by the means of the four perceptual distributions are not parallel before a linear transformation, they will not be parallel after a linear transformation, either. Because this parallelism is a necessary component of perceptual separability, the desired conclusion follows. Note that this holds for any linear transformation, not just the rotation and shear transformations invoked in the decisional separability proof.

There’s an unintended irony here, I think. On the one hand, the decisional proof was concerned with establishing general properties of a class of models, and our desire was to show that failure of decisional separability can be transformed away in these models. For the reasons described above, the class of models addressed in the proof was fairly constrained. On the other hand, the perceptual proof really just required a single counter-example, since if there’s even just one model in the relevant class in which perceptual separability cannot be imposed, then the reviewer’s suggestion is not, in general, true. It turned out to be simpler to relax a number of the strong assumptions made for the decisional proof, and this enabled us to establish a very general class of counter-examples.

The full proofs are given in the appendix of the paper.

Posted in SCIENCE!, statistical modeling | Comments Off

The psychology of reviews

Four or so years ago, I submitted a paper to Attention, Perception, & Psychophysics. It was very long, in large part because there was a fair amount of redundant information. The reviewers pointed this out, and I was happy to simply remove large chunks of the paper. The full suite of revisions satisfied the reviewers, and the paper was published.

It occurred to me then that it might be reasonable to include sizable chunks of material that you are (or will be) happy to remove in papers submitted to journals. Reviewers can point out that they need to be removed, and you can then (happily) do what the reviewers suggest. This compliance may license the refusal to do other things the reviewers suggest, or it may just engender warm, fuzzy feelings that lead the action editor and reviewers to accept the paper.

Cynical, I suppose, but it’s also potentially useful. Of course, it’s also almost certainly very difficult to pull off and fairly likely to backfire. You need to write a section that’s not obviously intended for post-review sacrifice, but one that you’re still happy to get rid of. And then if reviewers don’t request removal, you may still want to get rid of it, in which case you may need to justify a sizable change that no one else wanted. Plus, you could easily get a reputation for writing overly long and confusing papers. (Note that not including sections you plan on removing later is not guarantee that you’ll avoid such a reputation.)

Anyway, these thoughts occurred to me again recently. In a new paper (which I’ve mentioned before), Robin Thomas and I show that failure of decisional separability is not identifiable in Gaussian GRT with linear or piecewise linear decision bounds. I’ll explain what this means in some detail in order to explain what exactly was included in the original paper that I was happy to remove (and why the editor and reviewers were right to suggest removal).

First, a quick review of Gaussian GRT (see either of the linked papers above or here or here for detailed descriptions of the model): Stimuli produce random perceptual effects. We model these with perceptual distributions situated in a perceptual space that is partitioned by decision bounds into exhaustive and mutually exclusive response regions. The probability of producing any particular response given the presentation of some stimulus is the multiple integral of the appropriate perceptual distribution over the appropriate response region.

Here’s a bird’s eye view of a Gaussian GRT model:

Linear failure of decisional separability

The ellipses are equal likelihood contours (i.e., points on the perceptual distribution density functions that are the same height above the x-y plane) and the lines are the decision bounds. Decisional separability holds in GRT if the decision bounds are parallel to the coordinate axes. Perceptual separability holds if the marginal perceptual distributions on one dimension are identical across levels of the other, and perceptual independence holds if the correlation for a particular distribution is zero.

So, Robin and I showed analytically that with linear bounds (as illustrated above), any model with failure of decisional separability can be transformed into a model in which decisional separability holds while holding the predicted response probabilities constant; failure of decisional separability is not identifiable in this model. We also showed, via simulation, that failure of decisional separability isn’t identifiable in a Gaussian GRT model with piecewise linear bounds.

In the original draft, I also wrote two sections, one of which I kind of knew at the time didn’t belong and another that maybe could have stayed. In the former, we showed that linear bounds exhibit decisional separability and optimality (in the sense of producing the highest possible accuracy) under a fairly simple set of constraints on the configuration of perceptual distributions. (I’ll say what those constraints are in another post.) In the latter, we discussed one way that decisional interactions could be discussed even in a model with decisional separability.

The section on optimality dealt with issues of decisional separability, so it wasn’t totally unrelated to the rest of the paper. But the previous sections showed how we can’t know whether decisional separability fails or holds with linear or piecewise decision bounds, while the optimality section dealt with a class of models in which we can know that decisional separability does, in fact, hold. This section upset the flow of the paper, diluting the otherwise forcefully made point about (lack of) model identifiability in Gaussian GRT. This fact, along with the possibility that this section can be written up a separate theoretical note paper, made the decision to remove it easy.

The other section focused on what I called “relative response bias,” or the fact that different configurations of perceptual distributions will produce different marginal response criteria on one dimension across levels of the other dimension even when decisional separability holds. This section didn’t upset the flow of the paper like the optimality section did, and, in fact, I saw it as attempting to offer up more of a positive message as a counterpoint to the largely negative message of the bulk of the paper.

Accepting the argument of this section requires the reader to accept that in the Gaussian GRT model of factorial identification (i.e., response labels corresponding to and stimuli consisting of the factorial combination of, e.g., two levels on each of two dimensions/features), the decision bounds define the perceptual space. Instead of pushing this, we removed this section and added in discussion of the possibility of augmenting the factorial identification paradigm in order to enable tests of decisional separability (i.e., to allow the bounds not to define the space).

We’ll probably include a discussion of relative response bias in the GRT tutorial we’re working on now, so, as with the optimality section, the material will be repurposed rather than discarded entirely. By way of contrast, in the earlier paper, the material I got rid of was simply discarded.

This difference is noteworthy, I think, since it makes it clear how difficult it would be to purposefully include sacrificial material. Yes, I’ve had two papers that I was happy to remove big chunks of, and I imagine that in both cases this made me seem (accurately) willing to play ball with the editors and reviewers. And both papers were made better by the removal of the extraneous material.

But in the earlier case, the material consisted of largely redundant statistical analyses, while in the latter case, it was substantive work that can stand (at least partially) on its own. Which is to say that the removed material in the former case has essentially nothing in common with the removed material in the latter case (other than the removal itself).

And, of course, I’ve gotten other papers published without removing large chunks, so it’s not like doing so is a necessary part of the process. It’s certainly not a sufficient part, either, since there’s no way that, for example, the more recent paper would have gotten published if all I did was remove the two sections discussed above.

So, for the time being, I’ll just keep writing papers without intentionally including sacrificial sections. Perhaps after a few more revise-and-resubmits, I’ll have enough data to draw firmer conclusions. And when I write it up, I’ll know whether or not to include a lure for the reviewers.

Posted in SCIENCE!, statistical modeling | Comments Off