Tại sao người ta phải sử dụng REML (thay vì ML) để chọn trong số các mô hình var-covar lồng nhau?


15

Various descriptions on model selection on random effects of Linear Mixed Models instruct to use REML. I know difference between REML and ML at some level, but I don't understand why REML should be used because ML is biased. For example, is it wrong to conduct a LRT on a variance parameter of a normal distribution model using ML (see the code below)? I don't understand why it is more important to be unbiased than to be ML, in model selection. I think the ultimate answer has to be "because model selection works better with REML than with ML" but I would like to know a little more than that. I did not read the derivations of LRT and AIC (I am not good enough to understand them thoroughly), but if REML is explicitly used in the derivations, just knowing that will be actually sufficient (e.g., Chi-square approximation of the test statistic in LRT does not work if a MLE is biased).

n <- 100
a <- 10
b <- 1
alpha <- 5
beta <- 1
x <- runif(n,0,10)
y <- rnorm(n,a+b*x,alpha+beta*x)

loglik1 <- function(p,x,y){
   a <- p[1]
   b <- p[2]
   alpha <- p[3]
  -sum(dnorm(y,a+b*x,alpha,log=T))
}

loglik2 <- function(p,x,y){
   a <- p[1]
   b <- p[2]
   alpha <- p[3]
   beta <- p[4]
  -sum(dnorm(y,a+b*x,alpha+beta*x,log=T))
}

m1 <- optim(c(a,b,alpha),loglik1,x=x,y=y)$value
m2 <- optim(c(a,b,alpha,beta),loglik2,x=x,y=y)$value
D <- 2*(m1-m2)
1-pchisq(D,df=1) # p-value

1
About REML and AIC, you should have a look at this question.
Elvis

Câu trả lời:


12

A very short answer: the REML is a ML, so the test based on REML is correct anyway. As the estimation of the variance parameters with REML is better, it is natural to use it.

Why is REML a ML? Consider e.g. a model Y=Xβ+Zu+e

with XRn×p, ZRn×q, and βRp is the vector of the fixed effects, uN(0,τIq) is the vector of random effects, and eN(0,σ2In). The Restricted Likelihood can be obtained by considering np contrasts to "remove" the fixed effects. More precisely, let CR(np)×n, such that CX=0 and CC=Inp (that is, the columns of C are an orthonormal basis of the vector space orthognal to the space generated by the columns of X) ; then CY=CZu+ϵ
with ϵN(0,σ2Inp), and the likelihood for τ,σ2 given CY is the Restricted Likelihood.

Nice answer (+1), am I correct to say that the matrix C is dependent on the model for the average ? So you can only compare REML estimates for the same C matrix ?

Yes, C depends on X (I’ll edit the answer in a minute to make it clear), so your nested models need to have the same variables with fixed effects.
Elvis

REML is not a ML! The ML is uniquely defined for a given probability model but the REML is dependent on the fixed-effects parameterization. See e.g. this comment by Doug Bates (as well as many historical ones on R-SIG-mixed-models).
Livius

1
@Livius I think my answer states clearly enough how the restricted likelihood is constructed. It is a likelihood, it's just not the likelihood given the observed Y in the model written in the first displayed equation, but given the projected vector CY in the model written in the second displayed equation. The REMLis the ML obtained from this likelihood.
Elvis

2
I think that's kinda the point of DBates' protests on this issue: it is a different model, and it is a model for which comparisons are difficult because the model and the parameterization are intertwined. So you're not computing the ML for your original model but the ML for a different model arising from a particular parameterization of your original model. Hence REML-fitted models with nested fixed-effects structures are no longer nested models (as you mention above). But ML-fitted models are still nested, because you're maximizing the likelihood on the model specified.
Livius

9

Likelihood ratio tests are statistical hypothesis tests that are based on a ratio of two likelihoods. Their properties are linked to maximum likelihood estimation (MLE). (see e.g. Maximum Likelihood Estimation (MLE) in layman terms).

In your case (see question) you want to ''choose'' among two nested var-covar models, let's say you want to choose between a model where the var-covar is Σg and a model where the var-covar is Σs where the second one (simple model) is a special case of the first one (the general one).

The test is based on the likelihood ratio LR=2(log(Ls(ˆΣs))log(Lg(ˆΣg)). Where ˆΣs and ˆΣg are the maximum likelihood estimators.

The statistic LR is , asymptotically (!) χ2.

Maximum likelihood estimators are known to be consistent, however, in many cases they are biased . This is the case for the MLE estimators for the variance, ˆΣs and ˆΣg, it can be show that they are biased. This is because they are computed using a mean that was derived from the data, such that the spread around this 'estimated average' is smaller than the spread around the true mean (see e.g. Intuitive explanation for dividing by n1 when calculating standard deviation?)

The statistic LR above is χ2 in large samples, this is just because of the fact that, in large samples, ˆΣs and ˆΣg converge to their true values (MLE are consistent).
(Note: in the above link, for very large samples, dividing by n or by (n-1), will make no difference)

For smaller samples, the MLE estimates of ˆΣs and ˆΣg will be biased and therefore the distribution of LR will deviate from χ2, while the REML estimates will give unbiased estimates for Σs and Σg, so if you use, for the selection of the var-covar model, the REML estimates then the LR will for smaller samples be better approximated by the χ2.

Note that REML should only be used to choose among nested var-covar structures of models with the same mean, for models with different means, the REML is not appropriate, for models with different means one should use ML.


The statement "The statistic LR is , asymptotically (!) χ2" is not true in this case. This is because if Σs is nested in Σg, then Σs is on the boundary of Σg. In this case, the χ2 distribution does not hold. For example, see here
Cliff AB

@Cliff AB, this is what is explained below that statement and it is the reason you have to use REML.

-4

I have an answer that has more to do with common sense than with Statistics. If you take a look at PROC MIXED in SAS, the estimation can be performed with six methods:

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mixed_sect008.htm

but REML is the default. Why? Apparently, the practical experience showed it has the best performance (e.g., the smallest chance of convergence problems). Therefore, if your goal is achievable with REML, then it makes sense to use REML as opposed to the other five methods.


2
It has to to with 'large sample theory' and the biasedness of the MLE estimates, see my answer.

1
"It's the default in SAS" is not an acceptable answer to a "why" question on this site.
Paul

p-values for mixed models provided by SAS by default are not available by design in lme4 library for R because being untrustworthy (stat.ethz.ch/pipermail/r-help/2006-May/094765.html). So "default SAS" can be even wrong.
Tim
Khi sử dụng trang web của chúng tôi, bạn xác nhận rằng bạn đã đọc và hiểu Chính sách cookieChính sách bảo mật của chúng tôi.
Licensed under cc by-sa 3.0 with attribution required.