Since yesterday’s post on GRT wIND, I’ve been thinking quite a bit about the model. I am now reasonably confident I know why the two models I simulated didn’t produce identical data, my intuition still tells me that the model is over-parameterized, and I am now back to feeling like GRT wIND doesn’t solve the problem of identifiability of failures of decisional separability. I say “back to” because, when I first heard about the model, my initial thought was that it didn’t solve this problem. I thought about the model some more, and my uncertainty about this issue increased a good bit, but now that I’ve thought about it even more, I’m back to where I started.
However, I don’t want this to be a purely negative effort, so I will use my discussion of why I’ve come full circle with respect to GRT wIND to illustrate one of the most valuable, and (possibly) counterintuitive, aspects of mathematical modeling. Here’s the punchline, in case you don’t want to wade through the technical details below: mathematical modeling allows us to formulate and ask incisive questions. More below, though please note that if you haven’t read yesterday’s post, this one isn’t going to make much sense.
As I discussed yesterday, there are a number of identifiability issues with the model for the standard GRT task. The task is identification of stimuli defined by the factorial combination of two levels on each of two dimensions (e.g., red square, purple square, red octagon, purple octagon), and the model consists of bivariate Gaussian perceptual distributions corresponding to the stimuli and two decision bounds that define response regions corresponding to each possible response (in the standard task, there is a unique response for each stimulus).
One issue, as mentioned above, is that failures of decisional separability are, in general, not identifiable. Recall that decisional separability is defined as decision bounds that are parallel to the coordinate axes. In the simplest case, you have linear bounds. If you have a failure of decisional separability with linear bounds, you can apply a sequence of linear transformations to produce an empirically equivalent model in which decisional separability holds. You can see some more details about this here, and you can read the proof in my 2013 Psychonomic Bulletin & Review paper with Robin Thomas (that’s a link to a paywall, so if you want a copy, just email me at noahpoah at any of a number of different domains, including the obvious one, given where you’re reading this).
Another issue is that the means and marginal variances of the perceptual distributions are not simultaneously identifiable. If you have a model with decisional separability and arbitrary marginal variances in the perceptual distributions, it’s easy to transform each distribution so that each marginal variance is one and the means are shifted to preserve the predicted response probabilities. I showed how this works in yesterday’s post.
I contend that the first issue applies to GRT wIND (contra the claims of Soto, et al., in the paper that introduced the model), and given the first issue, the second issue seems tied up in the over-parameterization problem I can’t seem to let go of.
As I discussed yesterday, GRT wIND has a group-level set of perceptual distributions with arbitrary mean vectors, marginal variances, and correlations (with one distribution fixed at the origin and with unit marginal variances). The individual-level model for subject has a scaled version of the group level model for its perceptual distributions and two linear decision bounds. As I mentioned yesterday, the scaling can be formalized as a linear transformation of the group-level covariance matrix (where, e.g., “red” and square):
It occurred to me today that you can formalize this as a (possibly illicit) affine transformation that transforms the covariance matrix as desired while leaving the perceptual distribution’s mean vector unchanged:
This is possibly illicit because it seems like it could cause some problems to include the mean in the transformation. But this is beside the main point.
The main point being that each individual subject has a transformation that scales the whole space () and scales the two dimensions with respect to one another (), and, together with the two decision bounds for subject , these produce a standard Gaussian GRT model. Hence, the model for subject can be transformed to ensure that decisional separability holds, and once decisional separability holds, the marginal variances and means of each distribution can be scaled and shifted so that all marginal variances are one. Put slightly differently, in addition to the scaling transformation described above, subject can also have rotation and shear transformations to enforce decisional separability and the unit-variance transformation to force all the action onto the means.
There’s no mathematical rationale for allowing each subject to have – and -based scaling but not rotations, shear transformations, and unit variances. Rotation, shear transformation, and scaling to unit variance all produce models that are empirically equivalent to the model pre-transformation. All of which is to say that for every GRT wIND model with failures of decisional separability, there is an empirically equivalent model in which decisional separability holds. Ergo, GRT wIND does not, by itself, solve the identifiability problems discussed in Silbert & Thomas (2013).
Of course, there may be psychological reasons for allowing one type of transformation and not another. In my second review of the more recent GRT wIND paper, I suggested one such line of reasoning. Specifically, I suggested that scaling transformations could be allowed while rotations and shear transformations were not by invoking the notion of primary perceptual dimensions. The basic argument along these lines would be that subjects share a common set of primary dimensions, and so may vary with respect to the overall salience of the stimulus set () and the salience of one dimension relative to the other (), but not with respect to how the perceptual distributions are aligned with the coordinate axes in perceptual space (rotations, shear transformations).
This approach doesn’t rule out re-scaling to enforce unit variances, and I strongly suspect that this is linked to the over-parameterization problem I (still) think the model has, but I’m going to leave this as an open issue for now, since I want to get back to the positive message I’m trying desperately to focus on (and, if I’m being honest, because I started working on trying to prove that this problem exists, and the elements of the scaled, rotated, and sheared covariance matrices were getting kind of ridiculous to work with).
To repeat the punchline from above: mathematical modeling allows us to formulate and ask incisive questions. In this case, it allows us to formulate and ask incisive questions about which kinds of transformations of perceptual space are consistent with human cognition and behavior. More to the point, it allows us to ask this about (absolute and relative) scaling, rotation, and shear transformations.
There has long been compelling evidence that scaling can provide a good account of at least some perceptual behavior (e.g., Nosofsky’s 1986 and 1987 papers on selective attention). There is also some evidence that rotation of at least some dimensions causes substantial changes in behavior (e.g., a 1990 paper by Melara & Marks). I’m not aware of any research on shear transformations of perceptual space, but given that scaling and rotation have been investigated with some success, shear transformation seem very likely to be amenable to empirical and theoretical inquiry.
In the end, then, mathematical analysis (of GRT, in this case) allows us to see where we might productively aim future efforts. It enables us to ask very specific questions about scaling, rotation, and shear transformations. Perhaps unfortunately, though, because of the empirical equivalence of the (rotation and shear) transformed and untransformed GRT models, it does not (yet) give us the tools to answer these questions.
Of course, if this case is at all illustrative of a general state of affairs, working out new mathematical tools to answer these questions seems very likely to allow us to formulate and ask any number of new, and as of yet unanticipated, incisive questions.
Lather, rinse, repeat, ad nauseam.