Làm thế nào để kiểm tra sự bình đẳng đồng thời của các hệ số được chọn trong mô hình logit hoặc probit?

Làm thế nào để kiểm tra sự bình đẳng đồng thời của các hệ số được chọn trong mô hình logit hoặc probit? Cách tiếp cận tiêu chuẩn là gì và trạng thái của phương pháp nghệ thuật là gì?

hypothesis-testing logit probit

— Qbik
nguồn

Kiểm tra Wald

Một cách tiếp cận tiêu chuẩn là bài kiểm tra Wald . Đây là những gì lệnh Stata test thực hiện sau khi hồi quy logit hoặc probit. Hãy xem cách nó hoạt động trong R bằng cách xem một ví dụ:

mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv") # Load dataset from the web
mydata$rank <- factor(mydata$rank)
mylogit <- glm(admit ~ gre + gpa + rank, data = mydata, family = "binomial") # calculate the logistic regression

summary(mylogit)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.989979   1.139951  -3.500 0.000465 ***
gre          0.002264   0.001094   2.070 0.038465 *  
gpa          0.804038   0.331819   2.423 0.015388 *  
rank2       -0.675443   0.316490  -2.134 0.032829 *  
rank3       -1.340204   0.345306  -3.881 0.000104 ***
rank4       -1.551464   0.417832  -3.713 0.000205 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Này, bạn muốn kiểm tra giả thuyết vs. . Đây là tương đương với việc thử nghiệm . Thống kê kiểm tra Wald là: $\beta_{gre}=\beta_{gpa}$ $\beta_{gre}\neq \beta_{gpa}$ $\beta_{gre} - \beta_{gpa} = 0$

W = \frac{(\hat{β} - β_{0})}{\hat{se} (\hat{β})} \sim N (0, 1)

$W=\frac{(\hat{\beta}-\beta_{0})}{\widehat{\operatorname{se}}(\hat{\beta})}\sim \mathcal{N}(0,1)$

hoặc là

W^{2} = \frac{(\hat{θ} - θ_{0})^{2}}{Var (\hat{θ})} \sim χ_{1}^{2}

$W^2 = \frac{(\hat{\theta}-\theta_{0})^2}{\operatorname{Var}(\hat{\theta})}\sim \chi_{1}^2$

Chúng tôi đây là và . Vì vậy, tất cả những gì chúng ta cần là sai số chuẩn của . Chúng ta có thể tính toán sai số chuẩn bằng phương pháp Delta : $\widehat{\theta}$ $\beta_{gre} - \beta_{gpa}$ $\theta_{0}=0$ $\beta_{gre} - \beta_{gpa}$

\hat{s e} (β_{g r e} - β_{g p a}) \approx \sqrt{Var (β_{g r e}) + Var (β_{g p a}) - 2 \cdot Cov (β_{g r e}, β_{g p a})}

$\hat{se}(\beta_{gre} - \beta_{gpa})\approx \sqrt{\operatorname{Var}(\beta_{gre}) + \operatorname{Var}(\beta_{gpa}) - 2\cdot \operatorname{Cov}(\beta_{gre},\beta_{gpa})}$

Vì vậy, chúng ta cũng cần hiệp phương sai của và . Ma trận phương sai hiệp phương sai có thể được trích xuất bằng lệnh sau khi chạy hồi quy logistic: $\beta_{gre}$ $\beta_{gpa}$ vcov

var.mat <- vcov(mylogit)[c("gre", "gpa"),c("gre", "gpa")]

colnames(var.mat) <- rownames(var.mat) <- c("gre", "gpa")

              gre           gpa
gre  1.196831e-06 -0.0001241775
gpa -1.241775e-04  0.1101040465

Cuối cùng, chúng ta có thể tính toán lỗi tiêu chuẩn:

se <- sqrt(1.196831e-06 + 0.1101040465 -2*-0.0001241775)
se
[1] 0.3321951

Vì vậy, giá trị Wald của bạn là $z$

wald.z <- (gre-gpa)/se
wald.z
[1] -2.413564

Để có giá trị , chỉ cần sử dụng phân phối chuẩn thông thường: $p$

2*pnorm(-2.413564)
[1] 0.01579735

Trong trường hợp này, chúng tôi có bằng chứng cho thấy các hệ số khác nhau. Cách tiếp cận này có thể được mở rộng đến hơn hai hệ số.

Sử dụng multcomp

Tính toán khá tẻ nhạt này có thể được thực hiện thuận tiện trong Rviệc sử dụng multcompgói. Đây là ví dụ tương tự như trên nhưng được thực hiện với multcomp:

library(multcomp)

glht.mod <- glht(mylogit, linfct = c("gre - gpa = 0"))

summary(glht.mod)    

Linear Hypotheses:
               Estimate Std. Error z value Pr(>|z|)  
gre - gpa == 0  -0.8018     0.3322  -2.414   0.0158 *

confint(glht.mod)

Cũng có thể tính khoảng tin cậy cho sự khác biệt của các hệ số:

Quantile = 1.96
95% family-wise confidence level


Linear Hypotheses:
               Estimate lwr     upr    
gre - gpa == 0 -0.8018  -1.4529 -0.1507

Để biết thêm ví dụ về multcomp, xem tại đây hoặc ở đây .

Kiểm tra tỷ lệ khả năng (LRT)

Các hệ số của hồi quy logistic được tìm thấy theo khả năng tối đa. Nhưng bởi vì chức năng khả năng liên quan đến rất nhiều sản phẩm, khả năng đăng nhập được tối đa hóa để biến các sản phẩm thành tổng. Mô hình phù hợp hơn có khả năng đăng nhập cao hơn. Mô hình liên quan đến nhiều biến hơn có ít nhất khả năng giống như mô hình null. Biểu thị khả năng ghi nhật ký của mô hình thay thế (mô hình chứa nhiều biến hơn) với và khả năng ghi nhật ký của mô hình null với , thống kê kiểm tra tỷ lệ khả năng là: $LL_{a}$ $LL_{0}$

D = 2 \cdot (L L_{a} - L L_{0}) \sim χ_{d f 1 - d f 2}^{2}

$D=2\cdot (LL_{a} - LL_{0})\sim \chi_{df1-df2}^{2}$

Thống kê kiểm tra tỷ lệ khả năng tuân theo phân phối với mức độ tự do là sự khác biệt về số lượng biến. Trong trường hợp của chúng tôi, đây là 2. $\chi^{2}$

Để thực hiện kiểm tra tỷ lệ khả năng, chúng ta cũng cần điều chỉnh mô hình với ràng buộc để có thể so sánh hai khả năng. Mô hình đầy đủ có biểu mẫu $\beta_{gre}=\beta_{gpa}$ . Mô hình ràng buộc của chúng ta có dạng:

\log (\frac{p_{i}}{1 - p_{i}}) = β_{0} + β_{1} \cdot g r e + β_{2} \cdot g p a + β_{3} \cdot r a n k_{2} + β_{4} \cdot r a n k_{3} + β_{5} \cdot r a n k_{4}

$\log\left(\frac{p_{i}}{1-p_{i}}\right)=\beta_{0}+\beta_{1}\cdot \mathrm{gre} + \beta_{2}\cdot \mathrm{gpa}+\beta_{3}\cdot \mathrm{rank_{2}} + \beta_{4}\cdot \mathrm{rank_{3}}+\beta_{5}\cdot \mathrm{rank_{4}}$

\log (\frac{p_{i}}{1 - p_{i}}) = β_{0} + β_{1} \cdot (g r e + g p a) + β_{2} \cdot r a n k_{2} + β_{3} \cdot r a n k_{3} + β_{4} \cdot r a n k_{4}

$\log\left(\frac{p_{i}}{1-p_{i}}\right)=\beta_{0}+\beta_{1}\cdot (\mathrm{gre} + \mathrm{gpa})+\beta_{2}\cdot \mathrm{rank_{2}} + \beta_{3}\cdot \mathrm{rank_{3}}+\beta_{4}\cdot \mathrm{rank_{4}}$

mylogit2 <- glm(admit ~ I(gre + gpa) + rank, data = mydata, family = "binomial")

Trong trường hợp của chúng tôi, chúng tôi có thể sử dụng logLikđể trích xuất khả năng đăng nhập của hai mô hình sau khi hồi quy logistic:

L1 <- logLik(mylogit)
L1
'log Lik.' -229.2587 (df=6)

L2 <- logLik(mylogit2)
L2
'log Lik.' -232.2416 (df=5)

Mô hình chứa các ràng buộc trên grevà gpacó khả năng đăng nhập cao hơn một chút (-232,24) so với mô hình đầy đủ (-229,26). Thống kê kiểm tra tỷ lệ khả năng của chúng tôi là:

D <- 2*(L1 - L2)
D
[1] 16.44923

$\chi^{2}_{2}$ $p$

1-pchisq(D, df=1)
[1] 0.01458625

$p$

R có kiểm tra tỷ lệ khả năng được xây dựng trong; chúng ta có thể sử dụng anovahàm để tính toán kiểm tra tỷ lệ thích:

anova(mylogit2, mylogit, test="LRT")

Analysis of Deviance Table

Model 1: admit ~ I(gre + gpa) + rank
Model 2: admit ~ gre + gpa + rank
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
1       395     464.48                       
2       394     458.52  1   5.9658  0.01459 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Một lần nữa, chúng tôi có bằng chứng mạnh mẽ rằng các hệ số grevà gpakhác biệt đáng kể với nhau.

Kiểm tra điểm số (còn gọi là kiểm tra Điểm của Rao hay còn gọi là kiểm tra số nhân Lagrange)

$U(\theta)$ $\text{log} L(\theta|x)$ $\theta$ $x$

Bạn (θ) = = \frac{\partial đăng nhập L (θ | x)}{\partial θ}

$U(\theta) = \frac{\partial \text{log} L(\theta|x)}{\partial \theta}$

$I(\theta)$ $\theta$

S (θ_{0}) = = \frac{Bạn (θ_{0}^{2})}{Tôi (θ_{0})} ~ χ_{1}^{2}

$S(\theta_{0})=\frac{U(\theta_{0}^{2})}{I(\theta_{0})}\sim\chi^{2}_{1}$

Kiểm tra điểm số cũng có thể được tính bằng cách sử dụng anova(số liệu thống kê kiểm tra điểm được gọi là "Rao"):

anova(mylogit2, mylogit,  test="Rao")

Analysis of Deviance Table

Model 1: admit ~ I(gre + gpa) + rank
Model 2: admit ~ gre + gpa + rank
  Resid. Df Resid. Dev Df Deviance    Rao Pr(>Chi)  
1       395     464.48                              
2       394     458.52  1   5.9658 5.9144  0.01502 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Kết luận vẫn giống như trước đây.

Ghi chú

Một mối quan hệ thú vị giữa các thống kê kiểm tra khác nhau khi mô hình là tuyến tính là (Johnston và DiNardo (1997): Phương pháp kinh tế lượng ): Wald $\geq$ Trung tâm $\geq$ Ghi bàn.

— COOLSerdash
nguồn

I wonder why the reduced model simply excludes gre and gpa? Isn't that testing

β_{1} = β_{2} = 0

$\beta_1=\beta_2=0$ , not

β_{1} = β_{2}

$\beta_1=\beta_2$ ? To me, to correctly test

β_{1} = β_{2}

$\beta_1=\beta_2$ , we need to keep gre and gpa and meanwhile impose

β_{gre} = β_{gpa}

$\beta_{\text{gre}}=\beta_{\text{gpa}}$ .

— Sibbs Gambling

@SibbsGambling Good catch! I updated my answer accordingly.

— COOLSerdash

Is this limited to continuous predictors only, or could I - for instance - also see whether two levels of a categorical variable are significantly different? Let's say, is the difference between rank3 and rank4 significant?

— Daniel

@Daniel Yes, this approach can also be used for levels of a categorical variable. The multcomp packages makes it particularly easy. For example, try this: glht.mod <- glht(mylogit, linfct = c("rank3 - rank4= 0")). But a much easier way would be to make rank3 the reference level (using mydata$rank <- relevel(mydata$rank, ref="3")) and then just use the normal regression output. Each level of the factor is compared to the reference level. The p-value for rank4 would be the desired comparison.

— COOLSerdash

@Daniel The p-values from the model output (changed reference level) and glht are the same for me (about

0.591

$0.591$ ). Regarding your second question: linfct = c("rank3 - rank4= 0") tests only one linear hypothesis whereas mcp(rank="Tukey") tests all 6 pairwise comparisons of rank. So the p-values have to be adjusted for multiple comparisons. This means that the p-values using Tukey's test are generally higher than the single comparison.

— COOLSerdash

You did not specify your variables, if they are binary or something else. I think you talk about binary variables. There also exist multinomial versions of the probit and logit model.

In general, you can use the complete trinity of test approaches, i.e.

Likelihood-Ratio-test

LM-Test

Wald-Test

Each test uses different test-statistics. The standard approach would be to take one of the three tests. All three can be used to do joint tests.

The LR test uses the differnce of the log-likelihood of a restricted and the unrestricted model. So the restricted model is the model, in which the specified coefficients are set to zero. The unrestricted is the "normal" model. The Wald test has the advantage, that only the unrestriced model is estimated. It basically asks, if the restriction is nearly satisfied if it is evaluated at the unrestriced MLE. In case of the Lagrange-Multiplier test only the restricted model has to be estimated. The restricted ML estimator is used to calculate the score of the unrestricted model. This score will be usually not zero, so this discrepancy is the basis of the LR test. The LM-Test can in your context also be used to test for heteroscedasticity.

— Jen Bohold
nguồn

The standard approaches are the Wald test, the likelihood ratio test and the score test. Asymptotically they should be the same. In my experience the likelihood ratio tests tends to perform slightly better in simulations on finite samples, but the cases where this matters would be in very extreme (small sample) scenarios where I would take all of these tests as a rough approximation only. However, depending on your model (number of covariates, presence of interaction effects) and your data (multicolinearity, the marginal distribution of your dependent variable), the "wonderful kingdom of Asymptotia" can be well approximated by a surprisingly small number of observations.

Dưới đây là một ví dụ về mô phỏng như vậy trong Stata bằng cách sử dụng Wald, tỷ lệ khả năng và điểm kiểm tra trong một mẫu chỉ có 150 quan sát. Ngay cả trong một mẫu nhỏ như vậy, ba thử nghiệm tạo ra các giá trị p khá giống nhau và phân phối lấy mẫu của các giá trị p khi giả thuyết null là đúng dường như tuân theo phân phối đồng đều như vậy (hoặc ít nhất là sai lệch so với phân phối đồng đều không lớn hơn người ta mong đợi do tính ngẫu nhiên trong một thí nghiệm ở Monte Carlo).

clear all
set more off

// data preparation
sysuse nlsw88, clear

gen byte edcat = cond(grade <  12, 1,     ///
                 cond(grade == 12, 2, 3)) ///
                 if grade < .
label define edcat 1 "less than high school" ///
                   2 "high school"           ///
                   3 "more than high school"
label value edcat edcat
label variable edcat "education in categories"

// create cascading dummies, i.e.
// edcat2 compares high school with less than high school
// edcat3 compares more than high school with high school
gen byte edcat2 = (edcat >= 2) if edcat < .
gen byte edcat3 = (edcat >= 3) if edcat < .

keep union edcat2 edcat3 race south
bsample 150 if !missing(union, edcat2, edcat3, race, south)

// constraining edcat2 = edcat3 is equivalent to adding 
// a linear effect (in the log odds) of edcat
constraint define 1 edcat2 = edcat3

// estimate the constrained model
logit union edcat2 edcat3 i.race i.south, constraint(1)

// predict the probabilities
predict pr
gen byte ysim = .
gen w = .

program define sim, rclass
    // create a dependent variable such that the null hypothesis is true
    replace ysim = runiform() < pr

    // estimate the constrained model
    logit ysim edcat2 edcat3 i.race i.south, constraint(1)
    est store constr

    // score test
    tempname b0
    matrix `b0' = e(b)
    logit ysim edcat2 edcat3 i.race i.south, from(`b0') iter(0)
    matrix chi = e(gradient)*e(V)*e(gradient)'
    return scalar p_score = chi2tail(1,chi[1,1])

    // estimate unconstrained model
    logit ysim edcat2 edcat3 i.race i.south 
    est store full

    // Wald test
    test edcat2 = edcat3
    return scalar p_Wald = r(p)

    // likelihood ratio test
    lrtest full constr
    return scalar p_lr = r(p)
end

simulate p_score=r(p_score) p_Wald=r(p_Wald) p_lr=r(p_lr), reps(2000) : sim
simpplot p*, overall reps(20000) scheme(s2color) ylab(,angle(horizontal))

enter image description here

— Maart Buis
nguồn

score test is a different name for what @jen-bohold called a Lagrange multiplier (LM) test.

— Maarten Buis

Nice answer (+1). I especially like the effort of the simulation. I didn't know how to calculate the score test in Stata. Thanks.

— COOLSerdash