At Mid-Phon 17, I was talking to a colleague about mixed-effects (i.e., multilevel) modeling, and he stated very matter-of-fact-ly that when a random-intercepts-by-subjects model is estimated, there are no intercepts estimated for each individual subject. Rather, he continued, the model specifies a probability distribution over the subject-specific intercepts, so you get an estimate of the variance of the intercepts (with, though it wasn’t mentioned explicitly during this conversation, the group-level intercept providing the mean of the distribution).
I didn’t know how to respond, since this seemed obviously wrong to me. It seemed wrong to me because it is wrong. The next talk was starting, though, so there was but a brief moment of awkward silence as I tried to process these assertions prior to returning to my seat.
The topic didn’t come up again at the meeting, though shortly thereafter I talked to a (different) colleague (and friend) of mine who is knowledgable about such models. He immediately asked if the first colleague uses SPSS. I didn’t know the answer, but it’s certainly possible, even plausible. Colleague #2 asked because, apparently, SPSS makes it exceptionally difficult (maybe impossible) to see the estimated subject-specific (or item-specific, or whatever-grouping-variable-specific) parameters in a fitted model.
So maybe SPSS users have a strange, and limited, view of how random effects work, I thought. Until today.
Today, I was finishing up reading Florian Jaeger’s 2008 paper (pdf) arguing that mixed-effects logistic regression models are superior to ANOVA when doing categorical data analysis. I came to more or less the same conclusion sometime around 2008, though without reading Jaeger’s paper. I say more or less the same conclusion because while I agree with Jaeger that (the traditional conception of) ANOVA isn’t appropriate for categorical data analysis, I think that logistic regression is just one of a number of more appropriate models (GRT being another such model, at least for certain cases).
In any case, I finally got around to starting Jaeger’s paper recently, and I picked it back up today only to find this quote (p. 443):
The only parameter the model fits for the random effects is their variance (see also Baayen et al., 2008; for details on the implementation, see Bates & Sarkar, 2007).
Jaeger is discussing these models in the context of R and lme4, so I am kind of flabbergasted at this assertion. I’m flabbergasted in part because when you fit a multilevel regression model (logistic or otherwise) using lmer, the function
with the lmer output object as the argument returns the estimated random effects parameters. In fact, I just happened to have used
to make a figure illustrating the distribution of random item intercepts in a mixed-effects logistic regression model a few days ago (for one of my ICA/ASA/CAA proceedings papers):

Since Jaeger cites Baayen et al. (pdf) in support of this assertion, I figured I’d see if I could trace the problem back to that. I haven’t read the paper in full yet, but I found this (pp. 393-394):
When a mixed-effects model is fitted to a data set, its set of estimated parameters includes the coefficients for the fixed effects on the one hand, and the standard deviations and correlations for the random effects on the other hand. The individual values of the adjustments made to intercepts and slopes are calculated once the random-effects parameters have been estimated. Formally, these adjustments, referenced as Best Linear Unbiased Predictors (or BLUPs), are not parameters of the model.
I don’t know much about BLUPs, so maybe I’m way off-base to say that it strikes me as rather silly to treat estimated random effects as substantively distinct from parameters. They’re certainly not data; we haven’t, indeed we can’t in principle, observe them. And they enter into the equation from which estimated predicted dependent variables are calculated, and in, in every relevant respect that I can think of, in exactly the same way that the ‘fixed effects’ parameters do. In addition, statistical bias is a property of a model’s parameter(s), so to call the random effects unbiased non-parameters seems paradoxical. All of which makes them sound an awful lot like parameters to me. The fact that the variance and covariance parameters governing the random effects are estimated prior to estimating the random effects themselves is neither here nor there with respect to what we call the latter.
Indeed, it seems very odd to me that the variance and covariance parameters are estimated first, since my intuition is that there would need to be something varying and covarying in order to estimate these. I come at all of this from a Bayesian perspective, by which I mean that the first few multilevel models I fit to data, I built and estimated using BUGS and JAGS. In these cases, I don’t see any way that the ‘random effects’ could not be parameters while the means, variances, and covariances governing them are. These bits aren’t data, but they are part of the model equation – without them, you can’t calculate the likelihood function (see, e.g., the ‘theta construction’ section of this model or the role that the Bsp and Bsf arrays play in this model).
So, okay, it’s still possible that SPSS users have a weird, limited understanding of mixed-effects modeling. But there’s something else going on, too, and it’s not entirely clear to me what it is. I assume that the D. M. Bates who is the third author on the Baayen et al. paper is the same D. Bates that co-developed lme4, so I’m perfectly willing to grant that, with respect to (penalized maximum likelihood) estimation, the random effects, on the one hand, and the fixed effects and variances/covariances governing the random effects, on the other, are treated differently. But it seems confusing and confused, at best, to insist that the former are not parameters.

Actually, I think this “non-parameter” thing is more accurate than you might think. I’ve been digging more into these things recently. In short, unlike a Bayesian or other type analysis where there is no real distinction between the parameters of by-subject intercepts (or slopes) and other parameters, the way lme4 (and I think other mixed-effects-fitting software) works is exactly as described in your post. The variance of the random effect is a parameter of the model, but the actual by-subject intercepts are not “estimated” in the same way the fixed effects are estimated. One illustration of this is that if you do var() on the output of ranefs(), you don’t get exactly the same number as the estimated variance of the random effect in the model.
Maybe the best place for you to look (and I mean that, because you will understand the math better than I do) is one of the drafts of Bates’ long-in-progress book on lme4. If you go to this site:
http://lme4.r-forge.r-project.org/
And find a doc called “lrgprt.pdf”, you will find a relevant discussion in section 1.6 (at least, in the version at the time of this posting). Here’s a paragraph of it:
The more I think about it, the more I’m with Gelman on this (see, e.g., section 6 of this paper and pp. 262-265 of Gelman & Hill’s book). The section 6 bit addresses fixed vs. random effects terminology in general (arguing that it’s more useful to talk about constant vs. varying parameters), and the book section illustrates how there are a number of mathematically equivalent ways of writing ‘mixed effects’ models, a number of which make it pretty clear that the ‘random effects’ bits are parameters, too.
The fact that the random effects are estimated differently than are the parameters governing them doesn’t make the former something other than parameters. The key difference between fixed effects and random effects here is that random effects are modeled (i.e., there are variances and covariances governing them [as well as means, namely the associated fixed effects]) whereas the fixed effects aren’t. Yet there is still uncertainty about the fixed effects, else why have standard errors and confidence intervals and all the associated statistical testing machinery?
Bates says (on p. 2 of lrgprt.pdf) that “random effects are unobserved random variables.” Well, the fixed effects aren’t observed either, and the fact that nobody disputes that there is uncertainty about them (again, CIs, SEs, etc…) suggests that, whether we’re modeling them or not (i.e., whether we are willing to make particular distributional assumptions about them or not), there is an element of randomness smack dab in the middle of fixed effect estimation, too.
Ultimately, I can’t figure out what exactly Bates and others mean by ‘parameter’ that includes fixed effects and variances/covariances while excluding random effects.
This distinction, whatever it is, seems at odds with a lot of other statistical modeling, too. For example, in CFA and SEM, there are potentially huge numbers of latent variables, all of which are parameters in those models. Pretty much all of the cognitive/math psych modeling I know about has analogous structure – latent variables like the spatial location, dimension weights, and bias parameters in Generalized Context Model or the means and correlations in the perceptual distributions in GRT. I link to those two papers specifically because they illustrate maximum likelihood estimation and not Bayesian estimation (i.e., the unobserved parameters aren’t explicitly probabilistically modeled).
The fact that the variance of the estimated random effects can be different from the estimated variance doesn’t do much work here either. For any given step in an MCMC chain in a Bayesianly-estimated model, the actual variance of a set of ‘random effects’ parameters may or may not match the variance parameter governing them exactly. In fact, given that there’s exactly one way for the two numbers to match exactly and approximately infinitely many ways for them to fail to match, I would expect the latter to occur far more often.