Tại sao người ta phải sử dụng REML (thay vì ML) để chọn trong số các mô hình var-covar lồng nhau?

Various descriptions on model selection on random effects of Linear Mixed Models instruct to use REML. I know difference between REML and ML at some level, but I don't understand why REML should be used because ML is biased. For example, is it wrong to conduct a LRT on a variance parameter of a normal distribution model using ML (see the code below)? I don't understand why it is more important to be unbiased than to be ML, in model selection. I think the ultimate answer has to be "because model selection works better with REML than with ML" but I would like to know a little more than that. I did not read the derivations of LRT and AIC (I am not good enough to understand them thoroughly), but if REML is explicitly used in the derivations, just knowing that will be actually sufficient (e.g., Chi-square approximation of the test statistic in LRT does not work if a MLE is biased).

n <- 100
a <- 10
b <- 1
alpha <- 5
beta <- 1
x <- runif(n,0,10)
y <- rnorm(n,a+b*x,alpha+beta*x)

loglik1 <- function(p,x,y){
   a <- p[1]
   b <- p[2]
   alpha <- p[3]
  -sum(dnorm(y,a+b*x,alpha,log=T))
}

loglik2 <- function(p,x,y){
   a <- p[1]
   b <- p[2]
   alpha <- p[3]
   beta <- p[4]
  -sum(dnorm(y,a+b*x,alpha+beta*x,log=T))
}

m1 <- optim(c(a,b,alpha),loglik1,x=x,y=y)$value
m2 <- optim(c(a,b,alpha,beta),loglik2,x=x,y=y)$value
D <- 2*(m1-m2)
1-pchisq(D,df=1) # p-value

— quibble
nguồn

About REML and AIC, you should have a look at this question.

— Elvis

Câu trả lời:

A very short answer: the REML is a ML, so the test based on REML is correct anyway. As the estimation of the variance parameters with REML is better, it is natural to use it.

Why is REML a ML? Consider e.g. a model

$Y = X\beta + Zu + e \def\R{\mathbb{R}}$ with

$X\in\R^{n\times p}$ ,

$Z\in\R^{n\times q}$ , and

$\beta \in \R^p$ is the vector of the fixed effects,

$u \sim \mathcal N(0, \tau I_q)$ is the vector of random effects, and

$e \sim \mathcal N(0, \sigma^2 I_n)$ . The Restricted Likelihood can be obtained by considering

$n-p$ contrasts to "remove" the fixed effects. More precisely, let

$C \in \R^{(n-p)\times n}$ , such that

$CX = 0$ and

$CC' = I_{n-p}$ (that is, the columns of

$C'$ are an orthonormal basis of the vector space orthognal to the space generated by the columns of

$X$ ) ; then

$CY = CZ u + \epsilon$ with

$\epsilon \sim \mathcal N(0, \sigma^2 I_{n-p})$ , and the likelihood for

$\tau, \sigma^2$ given

$CY$ is the Restricted Likelihood.

— Elvis
nguồn

Nice answer (+1), am I correct to say that the matrix

$C$ is dependent on the model for the average ? So you can only compare REML estimates for the same

$C$ matrix ?

Yes,

$C$ depends on

$X$ (I’ll edit the answer in a minute to make it clear), so your nested models need to have the same variables with fixed effects.

— Elvis

REML is not a ML! The ML is uniquely defined for a given probability model but the REML is dependent on the fixed-effects parameterization. See e.g. this comment by Doug Bates (as well as many historical ones on R-SIG-mixed-models).

— Livius

@Livius I think my answer states clearly enough how the restricted likelihood is constructed. It is a likelihood, it's just not the likelihood given the observed

$Y$ in the model written in the first displayed equation, but given the projected vector

$CY$ in the model written in the second displayed equation. The REMLis the ML obtained from this likelihood.

— Elvis

I think that's kinda the point of DBates' protests on this issue: it is a different model, and it is a model for which comparisons are difficult because the model and the parameterization are intertwined. So you're not computing the ML for your original model but the ML for a different model arising from a particular parameterization of your original model. Hence REML-fitted models with nested fixed-effects structures are no longer nested models (as you mention above). But ML-fitted models are still nested, because you're maximizing the likelihood on the model specified.

— Livius

Likelihood ratio tests are statistical hypothesis tests that are based on a ratio of two likelihoods. Their properties are linked to maximum likelihood estimation (MLE). (see e.g. Maximum Likelihood Estimation (MLE) in layman terms).

In your case (see question) you want to ''choose'' among two nested var-covar models, let's say you want to choose between a model where the var-covar is $\Sigma_g$ and a model where the var-covar is $\Sigma_s$ where the second one (simple model) is a special case of the first one (the general one).

The test is based on the likelihood ratio $LR=-2 (log(\mathcal{L}_s(\hat{\Sigma}_s)) - log(\mathcal{L}_g(\hat{\Sigma}_g) )$ . Where $\hat{\Sigma}_s$ and $\hat{\Sigma}_g$ are the maximum likelihood estimators.

The statistic $LR$ is , asymptotically (!) $\chi^2$ .

Maximum likelihood estimators are known to be consistent, however, in many cases they are biased . This is the case for the MLE estimators for the variance, $\hat{\Sigma}_s$ and $\hat{\Sigma}_g$ , it can be show that they are biased. This is because they are computed using a mean that was derived from the data, such that the spread around this 'estimated average' is smaller than the spread around the true mean (see e.g. Intuitive explanation for dividing by $n-1$ when calculating standard deviation?)

The statistic $LR$ above is $\chi^2$ in large samples, this is just because of the fact that, in large samples, $\hat{\Sigma}_s$ and $\hat{\Sigma}_g$ converge to their true values (MLE are consistent).
(Note: in the above link, for very large samples, dividing by n or by (n-1), will make no difference)

For smaller samples, the MLE estimates of $\hat{\Sigma}_s$ and $\hat{\Sigma}_g$ will be biased and therefore the distribution of $LR$ will deviate from $\chi^2$ , while the REML estimates will give unbiased estimates for $\Sigma_s$ and $\Sigma_g$ , so if you use, for the selection of the var-covar model, the REML estimates then the $LR$ will for smaller samples be better approximated by the $\chi^2$ .

Note that REML should only be used to choose among nested var-covar structures of models with the same mean, for models with different means, the REML is not appropriate, for models with different means one should use ML.

The statement "The statistic LR is , asymptotically (!) χ2" is not true in this case. This is because if

$\Sigma_s$ is nested in

$\Sigma_g$ , then

$\Sigma_s$ is on the boundary of

$\Sigma_g$ . In this case, the

$\chi^2$ distribution does not hold. For example, see here

— Cliff AB

@Cliff AB, this is what is explained below that statement and it is the reason you have to use REML.

-4

I have an answer that has more to do with common sense than with Statistics. If you take a look at PROC MIXED in SAS, the estimation can be performed with six methods:

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mixed_sect008.htm

but REML is the default. Why? Apparently, the practical experience showed it has the best performance (e.g., the smallest chance of convergence problems). Therefore, if your goal is achievable with REML, then it makes sense to use REML as opposed to the other five methods.

— James
nguồn

It has to to with 'large sample theory' and the biasedness of the MLE estimates, see my answer.

"It's the default in SAS" is not an acceptable answer to a "why" question on this site.

— Paul

p-values for mixed models provided by SAS by default are not available by design in lme4 library for R because being untrustworthy (stat.ethz.ch/pipermail/r-help/2006-May/094765.html). So "default SAS" can be even wrong.

— Tim