Suppose you have a confusion matrix, a matrix of values with rows corresponding to stimuli and columns to responses. Suppose further that you want to analyze the similarity between your stimuli and the bias to give one response or another by applying the similarity choice model to your confusion matrix.
In the SCM, the probability of giving response r to stimulus s is (where
is the bias to give response
and
is the similarity between stimulus
and stimulus
, and
is the number of stimuli):
(1) 
You might want to use this model because it has convenient, closed-form solutions for the similarity and bias parameters:
(2) 
Whence the
values? You could just divide your confusion counts by the row totals, or you could, since you care about statistical rigor, first obtain maximum likelihood estimates of your confusion counts, only then normalizing by the row sums to obtain MLEs of the confusion probabilities.
And you can obtain MLEs of your confusion counts by using iterative proportional fitting. I learned about iterative proportional fitting from Rob Nosofsky in one of the best classes I took as a graduate student (or any other kind of student, come to think of it). And just today, I found myself needing to use iterative proportional fitting once again.
So I wrote an R function:
iter.prop.fit <- function(M,delta = .001){
nr <- nrow(M)
nc <- ncol(M)
M.h <- matrix(1,nrow=nr,ncol=nc)
dimnames(M.h) <- dimnames(M)
d.t <- 1
nz <- .001
while(d.t >= delta){
M.h.a <- M.h
for(ri in 1:nr){
for(ci in 1:nc){
M.h[ri,ci] <- M.h[ri,ci]*sum(M[ri,])/sum(M.h[ri,]+nz)
M.h[ri,ci] <- M.h[ri,ci]*sum(M[,ci])/sum(M.h[,ci]+nz)
M.h[ri,ci] <- M.h[ri,ci]*(M[ri,ci]+M[ci,ri])/(M.h[ri,ci]+M.h[ci,ri]+nz)
}
}
d.t <- max(abs(M.h.a-M.h),na.rm=T)
}
return(M.h)
}
M is your confusion matrix, delta is a criterion that determines when the algorithm stops (smaller delta means less change from one step of the algorithm to the next), and M.h (h = hat, so M.h =
) is the matrix of maximum likelihood confusion counts.
The basic idea is to take a matrix full of ones and gradually adjust it to match, as closely as possible, the counts in the observed confusion matrix.
Because we’re using iterative proportion fitting to get SCM MLEs, there are three steps for each iteration of the algorithm. First, M.h is adjusted by row (i.e., stimuli). Second, M.h is adjusted by column (i.e., responses). Third, and finally, M.h is adjusted for each unique pair of stimuli (i.e., symmetric similarity).
When the largest change from one step to the next is smaller in magnitude than delta, the algorithm stops and returns M.h. Once you have M.h, you can normalize by row to get maximum likelihood confusion probability estimates (i.e.,
values) which you can plug into the equations above to obtain maximum likelihood estimates of the similarity and bias parameters.
What you do with the these is, of course, up to you.
I find this match of algorithm and model interesting, primarily because it seems kind of backwards. When you want to find MLEs for the parameters of, say, GRT, you can run a Newton-Raphson algorithm that navigates parameter space to find the MLEs for a given data set.
This makes intuitive sense to me – you have data, you make an educated guess about initial values, and you have rules for moving from your initial guess to your best guess.
With iterative proportional fitting and the SCM, on the other hand, you find MLEs of the data first, and only after you have this do you calculate parameter values. You don’t make an initial guess about your parameters, and you don’t explicitly navigate parameter space in order to find your best guess. Of course, because you have closed form solutions for the SCM parameters, for each step in the algorithm, you could calculate parameter values if you wanted to, so, even though iterative proportional fitting is expressed entirely in terms of the data, the algorithm is effectively following a simple set of rules for implicitly navigating parameter space.