What makes a model of perception “optimal”? Is the presence of Bayes’ rule enough? What makes a model computational vs. algorithmic and representational? How important is it for a model to account for a previously reported “effect”? And what if the effect in question is, well, questionable?
I plan to address these questions in a couple three posts. This post will describe the effect of interest and a recent approach to modeling it, and later posts will focus on the robustness of the effect and evaluation of some of the claims that accompany the model.
Okay, so the effect in question is the “perceptual magnet effect”, and it concerns how discriminability of speech sounds (or other stimuli) changes with distance from the center of a category. The basic idea of the effect is that a pair of tokens near the center of a category should be more difficult to discriminate than is a pair of tokens farther away from the category center.
Here’s a figure from a paper reporting the perceptual magnet effect in American adults and infants and not in monkeys (Kuhl, 1991, Perception and Psychophysics; pdf). The x-axis (“rings”) indicates distance from a referent stimulus, the y-axis indicates the percentage of trials during which adults/infants/monkeys did not notice a change between acoustically similar but not identical vowels (i.e., “misses”, or “generalization scores”). The empty symbols show the data when the referent was a good prototype of the American English vowel [i] (e.g., the vowel in “feet” or “feed”), as determined by adult native speaker goodness ratings; the filled symbols show the data when the referent was judged as a relatively poor example of an [i]. As indicated by the labels, the top panel shows data for adults, the middle for infants, and the bottom for monkeys.
The perceptual magnet effect consists of the difference in the number of times that adults and infants missed a difference between the referent and a variant token when the referent was a prototype vs. when it was not (i.e., the gap between filled and unfilled symbols). Because monkeys don’t have speech sound categories, the reasoning goes, they don’t show a perceptual magnet effect with speech sound stimuli.
Here’s a figure from an article reporting the perceptual magnet effect in American and Swedish infants (Kuhl, et al., 1992; pdf). The top panel shows data for American infants, the bottom for Swedish infants. The filled symbols show data in response to variants of the American English vowel [i], and the empty symbols show data in response to variants of the Swedish vowel [y] (which is produced more or less like [i], but with rounded lips – you can listen to sound files of all the Swedish vowels here).
The perceptual magnet effect here consists of the fact that American infants behaved as if variants of [i] were more similar to the [i] prototype than were variants of [y] relative to the [y] prototype, and vice versa for Swedish infants (while Swedish has an [i] vowel, the referent American [i] used was judged by Swedish adults as sitting somewhere between Swedish [i] and [e], and so wasn’t a good prototype of Swedish [i]).
In a 2009 paper (available here), Feldman, Griffiths, and Morgan describe a categorization model that produces the perceptual magnet effect for free. The model is presented as “rational,” which is to say that it is computational in the sense of Marr’s three levels of analysis, which is to say further that it is not (meant to be) algorithmic or a particular implementation. The model is also described as “optimal,” though as far as I can tell, optimality is not defined anywhere in the paper.
Anyway, the model is a multilevel Bayesian model in which phonetic categories govern target productions which in turn govern stimuli, which last object is the speech sound actually perceived by a listener. It it assumed that, given category , a target is distributed as a normal random variable with mean and variance :
It is further assumed that, given target , a stimulus is distributed as a normal random variable with mean and variance :
Integrating over (i.e., taking into account all possible intended target productions), you can express given thusly:
According to this model, listeners are “trying to infer the target given stimulus and category , so they must calculate “:
Where the key fact is that the mean (or expected value, ) of (i.e., the term on the left inside the parentheses) is a weighted sum of and . This fact “formalizes the idea of a perceptual magnet: The term pulls the perception of stimuli toward the category center, effectively shrinking perceptual space around the category.” Which I take to mean that while is the signal, is perceived.
Of course, languages have multiple phonetic categories, so it’s handy that the model also gives us, via Bayes rule, :
Here, is given by equation 3 above, and is the prior probability of encountering category . Taking a weighted sum over categories, we then get an expression for :
And, finally, an expression for the expected value of :
Here, the first term in the sum is the expected value of (i.e., the percept, given a stimulus and category) and the second term is the relative likelihood of category given stimulus . Note that the move from line one to line two makes use of the assumption that all categories have the same variance.
Okay, so this post is long enough, and it’s taken me more than enough time to write, so I’ll wrap it up with some stray thoughts and questions and a stated intention to follow up on this before too long.
First, are listeners only trying to infer ? It seems to me that whatever an “intended target” conveys above and beyond category membership, a listener needs to infer category , too.
Second, is it reasonable to assume that targets are normally distributed around some category center? I recognize that this is a convenient assumption, given that the math works out nicely when it’s made, but it seems fairly likely that it doesn’t reflect reality all that well.
Third, is it reasonable to assume that all categories have the same variance? Again, it’s convenient, but it wouldn’t surprise me if it’s not accurate.
Fourth, how does the perceptual magnet effect follow from this model, exactly? I’ll go into some detail in a later post to explore what all the above equations mean in a more intuitive, pictorial way.
Fifth, to the extent that the perceptual magnet effect is logically implied by this model, what does it say about the model if the perceptual magnet effect doesn’t (always) occur?
Sixth, in what sense, if any, is this model optimal?
Seventh, is this model computational and not algorithmic/representational as claimed, or are there algorithmic and representational assumptions lurking in there somewhere?
I plan to address these questions (and ask and answer some others) in a few follow up posts.