Sự khác biệt giữa thử nghiệm của McNemar và thử nghiệm chi bình phương và làm thế nào để bạn biết khi nào nên sử dụng thử nghiệm này?

30

Tôi đã cố gắng đọc lên các nguồn khác nhau, nhưng tôi vẫn không rõ thử nghiệm nào sẽ phù hợp trong trường hợp của tôi. Có ba câu hỏi khác nhau mà tôi đang hỏi về tập dữ liệu của mình:

Các đối tượng được kiểm tra nhiễm trùng từ X tại các thời điểm khác nhau. Tôi muốn biết liệu tỷ lệ dương tính với X sau có liên quan đến tỷ lệ dương tính với X trước hay không:
```
             After   
           |no  |yes|
Before|No  |1157|35 |
      |Yes |220 |13 |

results of chi-squared test: 
Chi^2 =  4.183     d.f. =  1     p =  0.04082 

results of McNemar's test: 
Chi^2 =  134.2     d.f. =  1     p =  4.901e-31
```
Theo hiểu biết của tôi, vì dữ liệu là các biện pháp lặp đi lặp lại, tôi phải sử dụng thử nghiệm của McNemar, để kiểm tra xem tỷ lệ dương tính với X có thay đổi hay không.

Nhưng các câu hỏi của tôi dường như cần kiểm tra chi bình phương - kiểm tra nếu tỷ lệ dương tính với X sau có liên quan đến tỷ lệ dương tính với X trước đó.

Tôi thậm chí không chắc là tôi có hiểu sự khác biệt giữa bài kiểm tra của McNemar và bình phương chính xác hay không. Điều gì sẽ là bài kiểm tra đúng nếu câu hỏi của tôi là "Tỷ lệ đối tượng bị nhiễm X sau khi khác với trước đây?"
Một trường hợp tương tự, nhưng thay vì trước và sau, tôi đo hai nhiễm trùng khác nhau tại một thời điểm:
```
        Y   
      |no  |yes|
X|No  |1157|35 |
 |Yes |220 |13 |
```
Thử nghiệm nào sẽ đúng ở đây nếu câu hỏi là "Tỷ lệ nhiễm trùng cao hơn có liên quan đến tỷ lệ Y cao hơn không"?
Nếu câu hỏi của tôi là "Nhiễm trùng Y tại thời điểm t2 có liên quan đến nhiễm trùng X tại thời điểm t1 không?", Xét nghiệm nào sẽ phù hợp?
```
              Y at t2   
            |no  |yes|
X at t1|No  |1157|35 |
       |Yes |220 |13 |
```

Tôi đã sử dụng bài kiểm tra của McNemar trong tất cả các trường hợp này, nhưng tôi nghi ngờ liệu đó có phải là bài kiểm tra phù hợp để trả lời câu hỏi của tôi không. Tôi đang sử dụng R. Tôi có thể sử dụng nhị thức glmthay thế không? Điều đó sẽ tương tự như bài kiểm tra chi bình phương?

r chi-squared mcnemar-test

— Anto
nguồn

1

Bạn đã thử đọc stats.stackexchange.com/questions/tagged/mcnemar-test chủ đề ở đây trong bài kiểm tra Mcnemar?

— ttnphns

Bạn có ý nghĩa gì bởi "mối quan hệ giữa hai xác suất"?

— Michael M

@ttnphns Tôi đã trải qua chúng, nhưng không thể cải tổ nó thành câu hỏi của tôi. Sau khi suy nghĩ nhiều hơn, dường như tôi có thể trả lời hai câu hỏi dựa trên Q1: Chi-sq sẽ cho tôi biết nếu tỷ lệ + ve X sau có liên quan đến tỷ lệ + ve X trước đó trong khi Mcnemar sẽ cho tôi biết nếu có thay đổi tỷ lệ. Tôi có đúng không

— Anto

Bạn không thể sử dụng tiêu chuẩn

-test độc lập ở đây vì mỗi người được đại diện bởi hai giá trị gây ra các mẫu không ngẫu nhiên.

χ^{2}

$\chi^2$

— Michael M

Cảm ơn @MichaelMayer. Tôi đã sử dụng mcnemar cho đến khi tôi thấy điều này . Trường hợp của Mcnemar được giải thích, anh ta nói những gì mà Chi-sq sẽ trả lời trong trường hợp tương tự. Tôi khá bối rối. Cách mà mỗi bài kiểm tra nói với chúng tôi được đóng khung trên trang này, tôi phải đi đến Chi-sq nhưng vì chúng là các phép đo trong cùng một chủ đề, tôi phải chọn McNemar!

— Anto

48

Rất đáng tiếc rằng bài kiểm tra của McNemar rất khó hiểu đối với mọi người. Tôi thậm chí còn nhận thấy rằng ở đầu trang Wikipedia của nó, nó nói rằng lời giải thích trên trang rất khó để mọi người hiểu. Giải thích ngắn gọn điển hình cho bài kiểm tra của McNemar là: 'một bài kiểm tra chi bình phương', hoặc đó là 'một bài kiểm tra về tính đồng nhất cận biên của bảng dự phòng'. Tôi thấy cả hai điều này đều không hữu ích. Đầu tiên, không rõ ý nghĩa của 'bên trong đối tượng chi bình phương' là gì, bởi vì bạn luôn đo đối tượng của mình hai lần (một lần trên mỗi biến) và cố gắng xác định mối quan hệ giữa các biến đó. Ngoài ra, "tính đồng nhất cận biên" (Đáng thương thay, ngay cả câu trả lời này có thể gây nhầm lẫn. Nếu có, nó có thể giúp đọc nỗ lực thứ hai của tôi dưới đây.)

Hãy xem liệu chúng tôi có thể làm việc thông qua quá trình suy luận về ví dụ hàng đầu của bạn để xem liệu chúng tôi có thể hiểu liệu (và nếu vậy, tại sao) bài kiểm tra của McNemar có phù hợp không. Bạn đã đặt:

enter image description here

Đây là một bảng dự phòng, vì vậy nó bao hàm một phân tích chi bình phương. Hơn nữa, bạn muốn hiểu mối quan hệ giữa và , và kiểm tra chi bình phương kiểm tra mối quan hệ giữa các biến, vì vậy, thoạt nhìn có vẻ như kiểm tra chi bình phương phải các phân tích trả lời câu hỏi của bạn. ${\rm Before}$ ${\rm After}$

Tuy nhiên, điều đáng nói là chúng ta cũng có thể trình bày những dữ liệu này như sau:

enter image description here

Khi bạn nhìn vào dữ liệu theo cách này, bạn có thể nghĩ rằng bạn có thể thực hiện một test cũ thông thường . Nhưng một test không hoàn toàn đúng. Có hai vấn đề: Thứ nhất, vì mỗi danh sách hàng dữ liệu đo từ cùng một chủ đề, chúng tôi sẽ không muốn làm một giữa các đối tượng -test, chúng tôi muốn làm một trong các đối tượng- -test. Thứ hai, vì những dữ liệu này được phân phối dưới dạng nhị thức , phương sai là một hàm của giá trị trung bình. Điều này có nghĩa là không có sự không chắc chắn bổ sung để lo lắng về khi ước tính trung bình mẫu (nghĩa là bạn không phải ước tính phương sai sau đó), vì vậy bạn không phải tham khảo phân phối , bạn có thể sử dụng $t$ $t$ $t$ $t$ $t$ $z$ phân phối. (Để biết thêm về điều này, nó có thể giúp để đọc câu trả lời của tôi ở đây: Các -test so với các thử nghiệm $z$ $\chi^2$ .) Vì vậy, chúng tôi sẽ cần một trong các đối tượng- -test. Đó là, chúng ta cần một bài kiểm tra bên trong các môn học về sự bình đẳng của tỷ lệ. $z$

Chúng tôi đã thấy rằng có hai cách suy nghĩ và phân tích các dữ liệu này (được nhắc nhở bởi hai cách khác nhau để xem dữ liệu). Vì vậy, chúng ta cần phải quyết định cách chúng ta nên sử dụng. Thử nghiệm chi bình phương đánh giá xem và có độc lập hay không. Đó là, những người bị bệnh trước đó có nhiều khả năng bị bệnh hơn những người chưa bao giờ bị bệnh. Rất khó để thấy điều đó sẽ không xảy ra khi các phép đo này được đánh giá trên cùng một đối tượng. Nếu bạn đã nhận được một kết quả không đáng kể (như bạn gần như làm) thì đó đơn giản chỉ là lỗi loại II. Thay vì ${\rm Before}$ ${\rm After}$ và là độc lập, bạn gần như chắc chắn muốn biết liệu phương pháp điều trị có hiệu quả không (một câu hỏi chi bình phương không trả lời). Điều này rất giống với bất kỳ số lượng điều trị so với các nghiên cứu đối chứng mà bạn muốn xem liệu phương tiện có bằng nhau hay không, ngoại trừ trong trường hợp này các phép đo của bạn là có / không và chúng nằm trong đối tượng. Hãy xem xét một điển hình hơn ${\rm Before}$ ${\rm After}$ $t$ tình trạng thử nghiệm với huyết áp đo được trước và sau khi điều trị. Những người có bp cao hơn mức trung bình mẫu của bạn trước đó gần như chắc chắn sẽ có xu hướng nằm trong số các bps cao hơn sau đó, nhưng bạn không muốn biết về tính nhất quán của bảng xếp hạng, bạn muốn biết liệu việc điều trị có dẫn đến thay đổi về bp trung bình không . Tình hình của bạn ở đây là tương tự trực tiếp. Cụ thể, bạn muốn chạy -test bên trong đối tượng bình đẳng về tỷ lệ. Đó là những gì bài kiểm tra của McNemar. $z$

$z$

\begin{array}{rrrrrr} A f t e r \\ N o & Y e s & t o t a l \\ B e f o r e & N o & 1157 & 35 & 1192 \\ Y e s & 220 & 13 & 233 \\ t o t a l & 1377 & 48 & 1425 \end{array}

$\begin{array}{rrrrrr} & &{\rm After} & & & \\ & &{\rm No} &{\rm Yes} & &{\rm total} \\ {\rm Before}&{\rm No} &1157 &35 & &1192 \\ &{\rm Yes} &220 &13 & &233 \\ & & & & & \\ &{\rm total} &1377 &48 & &1425 \\ \end{array}$

B e f o r e

${\rm Before}$

A f t e r

${\rm After}$

Before proportion yes = \frac{220 + 13}{1425}, After proportion yes = \frac{35 + 13}{1425}

$\text{Before proportion yes} = \frac{220 + 13}{1425},\quad\quad \text{After proportion yes} = \frac{35 + 13}{1425}$ What is interesting to note here is that

13

$13$ observations were yes both before and after. They end up as part of both proportions, but as a result of being in both calculations they add no distinct information about the change in the proportion of yeses. Moreover they are counted twice, which is invalid. Likewise, the overall total ends up in both calculations and adds no distinct information. By decomposing the proportions we are able to recognize that the only distinct information about the before and after proportions of yeses exists in the

220

$220$ and

35

$35$ , so those are the numbers we need to analyze. This was McNemar's insight. In addition, he realized that under the null, this is a binomial test of

220 / (220 + 35)

$220/(220 + 35)$ against a null proportion of

.5

$.5$ . (There is an equivalent formulation that is distributed as a chi-squared, which is what R outputs.)

There is another discussion of McNemar's test, with extensions to contingency tables larger than 2x2, here.

Here is an R demo with your data:

mat = as.table(rbind(c(1157, 35), 
                     c( 220, 13) ))
colnames(mat) <- rownames(mat) <- c("No", "Yes")
names(dimnames(mat)) = c("Before", "After")
mat
margin.table(mat, 1)
margin.table(mat, 2)
sum(mat)

mcnemar.test(mat, correct=FALSE)
#  McNemar's Chi-squared test
# 
# data:  mat
# McNemar's chi-squared = 134.2157, df = 1, p-value < 2.2e-16
binom.test(c(220, 35), p=0.5)
#  Exact binomial test
# 
# data:  c(220, 35)
# number of successes = 220, number of trials = 255, p-value < 2.2e-16
# alternative hypothesis: true probability of success is not equal to 0.5
# 95 percent confidence interval:
#  0.8143138 0.9024996
# sample estimates:
# probability of success 
#              0.8627451

If we didn't take the within-subjects nature of your data into account, we would have a slightly less powerful test of the equality of proportions:

prop.test(rbind(margin.table(mat, 1), margin.table(mat, 2)), correct=FALSE)
#  2-sample test for equality of proportions without continuity
#  correction
# 
# data:  rbind(margin.table(mat, 1), margin.table(mat, 2))
# X-squared = 135.1195, df = 1, p-value < 2.2e-16
# alternative hypothesis: two.sided
# 95 percent confidence interval:
#  0.1084598 0.1511894
# sample estimates:
#    prop 1    prop 2 
# 0.9663158 0.8364912

That is, X-squared = 133.6627 instead of chi-squared = 134.2157. In this case, these differ very little, because you have a lot of data and only $13$ cases are overlapping as discussed above. (Another, and more important, problem here is that this counts your data twice, i.e., $N = 2850$ , instead of $N = 1425$ .)

Here are the answers to your concrete questions:

The correct analysis is McNemar's test (as discussed extensively above).
This version is trickier, and the phrasing "does higher proportions of one infections relate to higher proportions of Y" is ambiguous. There are two possible questions:
- It is perfectly reasonable to want to know if the patients who get one of the infections tend to get the other, in which case you would use the chi-squared test of independence. This question is asking whether susceptibility to the two different infections is independent (perhaps because they are contracted via different physiological pathways) or not (perhaps they are contracted due to a generally weakened immune system).
- It is also perfectly reasonable to what to know if the same proportion of patients tend to get both infections, in which case you would use McNemar's test. The question here is about whether the infections are equally virulent.
Since this is once again the same infection, of course they will be related. I gather that this version is not before and after a treatment, but just at some later point in time. Thus, you are asking if the background infection rates are changing organically, which is again a perfectly reasonable question. At any rate, the correct analysis is McNemar's test.
Edit: It would seem I misinterpreted your third question, perhaps due to a typo. I now interpret it as two different infections at two separate timepoints. Under this interpretation, the chi-squared test would be appropriate.

— gung - Reinstate Monica
nguồn

@Alexis As far as I can make out, you and gung seem to be talking past each other. Even the so-called "unpaired" or "independent samples" t-test, or the "one-way" or "independent samples ANOVA", actually requires paired data in gung's sense: for each subject, you must record both a categorical group membership variable and a continuous outcome variable. (If the group membership variable has two levels, we usually use the unpaired t-test; for 3+ levels you need one-way ANOVA).

— Silverfish

2

When explaining which test to use, I show both ways of looking at it - if you have observations of a continuous variable, one for each subject, and the subjects come from 2 (or 3+) groups and you're interested in differences between groups, then use the independent-samples t-test (or one-way ANOVA). Then confirm your choice by looking at your data table: do you have, for each subject, two pieces of information: category for group membership and the continuous variable. We can even turn things around and say the t-test is a kind of test of association between binary and continuous variable.

— Silverfish

2

Paired t-test (or correlated samples ANOVA) is used if, for each subject, you have two (or 3+) continuous readings, taken under different conditions, and you want to test for differences between conditions. This is "paired" in a different sense. But in this question, we have two categorical variables recorded for each subject. Looking at the data table, the recorded values of those categorical variables must come in pairs. But this doesn't mean that the study design itself is paired. This is confusing (as gung notes). But if you know your study design, this can resolve it (as alexis notes)

— Silverfish

@Silverfish If you have two observations (of the same nominal variable) made on each subject, in what sense is that not a paired design?

— Alexis

1

@Alexis It's that "of the same variable" which is key - and potentially confusing. You might know it represents the same variable, albeit under different conditions or at different times, but depending on the way we lay the data table out, they may appear to be recorded as different variables (eg a separate "before" and "after" variable).

— Silverfish

22

Well, it seems I've made a hash of this. Let me try to explain this again, in a different way and we'll see if it might help clear things up.

The traditional way to explain McNemar's test vs. the chi-squared test is to ask if the data are "paired" and to recommend McNemar's test if the data are paired and the chi-squared test if the data are "unpaired". I have found that this leads to a lot of confusion (this thread being an example!). In place of this, I have found that it is most helpful to focus on the question you are trying to ask, and to use the test that matches your question. To make this more concrete, let's look at a made-up scenario:

You walk around a statistics conference and for each statistician you meet, you record whether they are from the US or the UK. You also record whether they have high blood pressure or normal blood pressure.

Here are the data:

mat = as.table(rbind(c(195,   5),
                     c(  5, 195) ))
colnames(mat)        = c("US", "UK")
rownames(mat)        = c("Hi", "Normal")
names(dimnames(mat)) = c("BP", "Nationality")
mat
#         Nationality
# BP        US  UK
#   Hi     195   5
#   Normal   5 195

At this point, it is important to figure out what question we want to ask of our data. There are three different questions we could ask here:

We might want to know if the categorical variables BP and Nationality are associated or independent;
We might wonder if high blood pressure is more common amongst US statisticians than it is amongst UK statisticians;
Finally, we might wonder if the proportion of statisticians with high blood pressure is equal to the proportion of US statisticians that we talked to. This refers to the marginal proportions of the table. These are not printed by default in R, but we can get them thusly (notice that, in this case, they are exactly the same):
```
margin.table(mat, 1)/sum(mat)
# BP
#    Hi Normal 
#   0.5    0.5 
margin.table(mat, 2)/sum(mat)
# Nationality
#  US  UK 
# 0.5 0.5 
```

As I said, the traditional approach, discussed in many textbooks, is to determine which test to use based on whether the data are "paired" or not. But this is very confusing, is this contingency table "paired"? If we compare the proportion with high blood pressure between US and UK statisticians, you are comparing two proportions (albeit of the same variable) measured on different sets of people. On the other hand, if you want to compare the proportion with high blood pressure to the proportion US, you are comparing two proportions (albeit of different variables) measured on the same set of people. These data are both "paired" and "unpaired" at the same time (albeit with respect to different aspects of the data). This leads to confusion. To try to avoid this confusion, I argue that you should think in terms of which question you are asking. Specifically, if you want to know:

If the variables are independent: use the chi-squared test.
If the proportion with high blood pressure differs by nationality: use the z-test for difference of proportions.
If the marginal proportions are the same: use McNemar's test.

Someone might disagree with me here, arguing that because the contingency table is not "paired", McNemar's test cannot be used to test the equality of the marginal proportions and that the chi-squared test should be used instead. Since this is the point of contention, let's try both to see if the results make sense:

chisq.test(mat)
#  Pearson's Chi-squared test with Yates' continuity correction
# 
# data:  mat
# X-squared = 357.21, df = 1, p-value < 2.2e-16
mcnemar.test(mat)
#  McNemar's Chi-squared test
# 
# data:  mat
# McNemar's chi-squared = 0, df = 1, p-value = 1

The chi-squared test yields a p-value of approximately 0. That is, it says that the probability of getting data as far or further from equal marginal proportions, if the marginal proportions actually were equal is essentially 0. But the marginal proportions are exactly the same, $50\%=50\%$ , as we saw above! The results of the chi-squared test just don't make any sense in light of the data. On the other hand, McNemar's test yields a p-value of 1. That is, it says that you will have a 100% chance of finding marginal proportions this close to equality or further from equality, if the true marginal proportions are equal. Since the observed marginal proportions cannot be closer to equal than they are, this result makes sense.

Let's try another example:

mat2 = as.table(rbind(c(195, 195),
                      c(  5,   5) ))
colnames(mat2)        = c("US", "UK")
rownames(mat2)        = c("Hi", "Normal")
names(dimnames(mat2)) = c("BP", "Nationality")
mat2
#         Nationality
# BP        US  UK
#   Hi     195 195
#   Normal   5   5
margin.table(mat2, 1)/sum(mat2)
# BP
#     Hi Normal 
#  0.975  0.025 
margin.table(mat2, 2)/sum(mat2)
# Nationality
#  US  UK 
# 0.5 0.5

In this case, the marginal proportions are very different, $97.5\%\gg 50\%$ . Let's try the two tests again to see how their results compare to the observed large difference in marginal proportions:

chisq.test(mat2)
#  Pearson's Chi-squared test
# 
# data:  mat2
# X-squared = 0, df = 1, p-value = 1
mcnemar.test(mat2)
#  McNemar's Chi-squared test with continuity correction
# 
# data:  mat2
# McNemar's chi-squared = 178.605, df = 1, p-value < 2.2e-16

This time, the chi-squared test gives a p-value of 1, meaning that the marginal proportions are as equal as they can be. But we saw that the marginal proportions are very obviously not equal, so this result doesn't make any sense in light of our data. On the other hand, McNemar's test yields a p-value of approximately 0. In other words, it is extremely unlikely to get data with marginal proportions as far from equality as these, if they truly are equal in the population. Since our observed marginal proportions are far from equal, this result makes sense.

The fact that the chi-squared test yields results that make no sense given our data suggests there is something wrong with using the chi-squared test here. Of course, the fact that McNemar's test provided sensible results doesn't prove that it is valid, it may just have been a coincidence, but the chi-squared test is clearly wrong.

Let's see if we can work through the argument for why McNemar's test might be the right one. I will use a third dataset:

mat3 = as.table(rbind(c(190,  15),
                      c( 60, 135) ))
colnames(mat3)        = c("US", "UK")
rownames(mat3)        = c("Hi", "Normal")
names(dimnames(mat3)) = c("BP", "Nationality")
mat3
#         Nationality
# BP        US  UK
#   Hi     190  15
#   Normal  60 135
margin.table(mat3, 1)/sum(mat3)
# BP
#     Hi Normal 
# 0.5125 0.4875 
margin.table(mat3, 2)/sum(mat3)
# Nationality
#    US    UK 
# 0.625 0.375

This time we want to compare $51.25\%$ to $62.5\%$ and wonder if in the population the true marginal proportions might have been the same. Because we are comparing two proportions, the most intuitive option would be to use a z-test for the equality of two proportions. We can try that here:

prop.test(x=c(205, 250), n=c(400, 400))
#  2-sample test for equality of proportions with continuity correction
# 
# data:  c(205, 250) out of c(400, 400)
# X-squared = 9.8665, df = 1, p-value = 0.001683
# alternative hypothesis: two.sided
# 95 percent confidence interval:
#   -0.18319286 -0.04180714
# sample estimates:
# prop 1 prop 2 
# 0.5125 0.6250

(To use prop.test() to test the marginal proportions, I had to enter the numbers of 'successes' and the total number of 'trials' manually, but you can see from the last line of the output that the proportions are correct.) This suggests that it is unlikely to get marginal proportions this far from equality if they were actually equal, given the amount of data we have.

Is this test valid? There are two problems here: The test believes we have 800 data, when we actually have only 400. This test also does not take into account that these two proportions are not independent, in the sense that they were measured on the same people.

Let's see if we can take this apart and find another way. From the contingency table, we can see that the marginal proportions are:

% high BP: \frac{190 + 15}{400} % US: \frac{190 + 60}{400}

$\text{% high BP: }\frac{190 + 15}{400} \qquad\qquad\qquad \text{% US: }\frac{190 + 60}{400}$ What we see here is that the

190

$190$ US statisticians with high blood pressure show up in both marginal proportions. They are both being counted twice and contributing no information about the differences in the marginal proportions. Moreover, the

400

$400$ total shows up in both denominators as well. All of the unique and distinctive information is in the two off-diagonal cell counts (

15

$15$ and

60

$60$ ). Whether the marginal proportions are the same or different is due only to them. Whether an observation is equally likely to fall into either of those two cells is distributed as a binomial with probability

π = .5

$\pi = .5$ under the null. That was McNemar's insight. In fact, McNemar's test is essentially just a binomial test of whether observations are equally likely to fall into those two cells:

binom.test(x=15, n=(15+60))
#  Exact binomial test
# 
# data:  15 and (15 + 60)
# number of successes = 15, number of trials = 75, p-value = 1.588e-07
# alternative hypothesis: true probability of success is not equal to 0.5
# 95 percent confidence interval:
#   0.1164821 0.3083261
# sample estimates:
# probability of success 
#                    0.2

In this version, only the informative observations are used and they are not counted twice. The p-value here is much smaller, 0.0000001588, which is often the case when the dependency in the data is taken into account. That is, this test is more powerful than the z-test of difference of proportions. We can further see that the above version is essentially the same as McNemar's test:

mcnemar.test(mat3, correct=FALSE)
#  McNemar's Chi-squared test
# 
# data:  mat3
# McNemar's chi-squared = 27, df = 1, p-value = 2.035e-07

If the non-identicallity is confusing, McNemar's test typically, and in R, squares the result and compares it to the chi-squared distribution, which is not an exact test like the binomial above:

(15-60)^2/(15+60)
# [1] 27
1-pchisq(27, df=1)
# [1] 2.034555e-07

Thus, when you want to check the marginal proportions of a contingency table are equal, McNemar's test (or the exact binomial test computed manually) is correct. It uses only the relevant information without illegally using any data twice. It does not just 'happen' to yield results that make sense of the data.

I continue to believe that trying to figure out whether a contingency table is "paired" is unhelpful. I suggest using the test that matches the question you are asking of the data.

— gung - Reinstate Monica
nguồn

1

You got my vote. :)

— Alexis

11

The question of which test to use, contingency table $\chi^{2}$ versus McNemar's $\chi^{2}$ of a null hypothesis of no association between two binary variables is simply a question of whether your data are paired/dependent, or unpaired/independent:

Binary Data in Two Independent Samples
In this case, you would use a contingency table $\chi^{2}$ test.

For example, you might have a sample of 20 statisticians from the USA, and a separate independent sample of 37 statisticians from the UK, and have a measure of whether these statisticians are hypertensive or normotensive. Your null hypothesis is that both UK and US statisticians have the same underlying probability of being hypertensive (i.e. that knowing whether one is from the USA or from the UK tells one nothing about the probability of hypertension). Of course it is possible that you could have the same sample size in each group, but that does not change the fact of the samples being independent (i.e. unpaired).

Binary Data in Paired Samples
In this case you would use McNemar's $\chi^{2}$ test.

For example, you might have individually-matched case-control study data sampled from an international statistician conference, where 30 statisticians with hypertension (cases) and 30 statisticians without hypertension (controls; who are individually matched by age, sex, BMI & smoking status to particular cases), are retrospectively assessed for professional residency in the UK versus residency elsewhere. The null is that the probability of residing in the UK among cases is the same as the probability of residing in the UK as controls (i.e. that knowing about one's hypertensive status tells one nothing about one's UK residence history).

In fact, McNemar's test analyzes pairs of data. Specifically, it analyzes discordant pairs. So the $r$ and $s$ from $\chi^{2}=\frac{[(r−s)−1]^{2}}{(r+s)}$ are counts of discordant pairs.

Anto, in your example, your data are paired (same variable measured twice in same subject) and therefore McNemar's test is the appropriate choice of test for association.

[gung and I disagreed for a time about an earlier answer.]

Quoted References
"Assuming that we are still interested in comparing proportions, what can we do if our data are paired, rather than independent?... In this situation, we use McNemar's test."–Pagano and Gauvreau, Principles of Biostatistics, 2nd edition, page 349. [Emphasis added]

"The expression is better known as the McNemar matched-pair test statistic (McNemar, 1949), and has been a mainstay of matched-pair analysis."—Rothman, Greenland, & Lash. Modern Epidemiology, page 286. [Emphasis added]

"The paired t test and repeated measures of analysis of variance can be used to analyze experiments in which the variable being studied can be measured on an interval scale (and satisfies other assumptions required of parametric methods). What about experiments, analogous to the ones in Chapter 5, where the outcome is measured on a nominal scale? This problem often arises when asking whether or not a an individual responded to a treatment or when comparing the results of two different diagnostic tests that are classified positive or negative in the same individuals. We will develop a procedure to analyze such experiments, Mcnemar's test for changes, in the context of one such study."—Glanz, Primer of Biostatistics, 7th edition, page 200. [Emphasis added. Glanz works through an example of a misapplication the contingency table $\chi^{2}$ test to paired data on page 201.]

"For matched case-control data with one control per case, the resultant analysis is simple, and the appropriate statistical test is McNemar's chi-squared test... note that for the calculation of both the odds ratio and the statistic, the only contributors are the pairs which are disparate in exposure, that is the pairs where the case was exposed but the control was not, and those where the control was exposed but the case was not."—Elwood. Critical Appraisal of Epidemiological Studies and Clinical Trials, 1st edition, pages 189–190. [Emphasis added]

— Alexis
nguồn

7

My understanding of McNemar's test is as follows: It is used to see whether an intervention has made a significant difference to a binary outcome. In your example, a group of subjects are checked for infection and the response is recorded as yes or no. All subjects are then given some intervention, say an antibiotic drug. They are then checked again for infection and response is recorded as yes/no again. The (pairs of) responses can be put in the contigency table:

             After   
           |no  |yes|
Before|No  |1157|35 |
      |Yes |220 |13 |

And McNemar's test would be appropriate for this.

It is clear from the table that many more have converted from 'yes' to 'no' (220/(220+13) or 94.4%) than from 'no' to 'yes' (35/(1157+35) or 2.9%). Considering these proportions, McNemar's P value (4.901e-31) appears more correct than chi-square P value (0.04082 ).

If contigency table represents 2 different infections (question 2), then Chi-square would be more appropriate.

Your 3rd question is ambiguous: you first state relating Y at t2 with Y at t1 but in the table you write 'X' at t1 vs Y at t2. Y at t2 vs Y at t1 is same as your first question and hence McNemar's test is needed, while X at t1 and Y at t2 indicates different events are being compared and hence Chi-square will be more appropriate.

Edit: As mentioned by Alexis in the comment, matched case-control data are also analyzed by McNemar's test. For example, 1425 cancer patients are recruited for a study and for each patient a matched control is also recruited. All these (1425*2) are checked for infection. The results of each pair can be shown by similar table:

             Normal   
           |no  |yes|
Cancer|No  |1157|35 |
      |Yes |220 |13 |

More clearly:

                                    Normal:
                                    No infection   Infection  
Cancer patient:     No infection    1157            35      
                    Infection       220             13

It shows that it is much more often that cancer patient had infection and control did not, rather than the reverse. Its significance can be tested by McNemar's test.

If these patients and controls were not matched and independent, one can only make following table and do a chisquare test:

            Infection
            No    Yes
Cancer  No  1377   48
        Yes 1192  233

More clearly:

                No infection        Infection
No cancer       1377                48
Cancer          1192                233

Note that these numbers are same as margins of the first table:

> addmargins(mat)
      After
Before   No  Yes  Sum
   No  1157   35 1192
   Yes  220   13  233
   Sum 1377   48 1425

That must be the reason for use of terms like 'marginal frequencies' and 'marginal homogeneity' in McNemar's test.

Interestingly, the addmargins function can also help decide which test to use. If the grand-total is half the number of subjects observed (indicating pairing has been done), then McNemar's test is applicable, else chisquare test is appropriate:

> addmargins(mat)
      Normal
Cancer   No  Yes  Sum
   No  1157   35 1192
   Yes  220   13  233
   Sum 1377   48 1425
> 
> addmargins(mat3)
      Infection
Cancer   No  Yes  Sum
   No  1377   48 1425
   Yes 1192  233 1425
   Sum 2569  281 2850

The R codes for above tables are as from answers above:

mat = as.table(rbind(c(1157, 35), 
                      c( 220, 13) ))
colnames(mat) <- rownames(mat) <- c("No", "Yes")
names(dimnames(mat)) = c("Cancer", "Normal")

mat3 = as.table(rbind(c(1377, 48), 
                     c(1192, 233) ))
colnames(mat3) <- rownames(mat3) <- c("No", "Yes")
names(dimnames(mat3)) = c("Cancer", "Infection")

Following pseudocode may also help knowing the difference:

subject_id      result_first_observation    result_second_observation   
1               no                          yes                     
2               yes                         no                      
...

mcnemar.test(table(result_first_observation, result_second_observation))



pair_id     result_case_subject     result_control_subject  
1           no                      yes                     
2           yes                     no                      
...

mcnemar.test(table(result_case_subject, result_control_subject))



subject_id      result_first_test       result_second_test
1               yes                     no
2               no                      yes
..

chisq.test(table(result_first_test, result_second_test))

Edit:

mid-p variation of peforming McNemar test ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3716987/ ) is interesting. It compares b and c of contingency table, i.e. number who changed from yes to no versus number who changed from no to yes (ignoring number of those who remained yes or no through the study). It can be performed using binomial test in python, as shown at https://gist.github.com/kylebgorman/c8b3fb31c1552ecbaafb

It could be equivalent to binom.test(b, b+c, 0.5) since in a random change, one would expect b to be equal to c.

— rnso
nguồn

3

Not only for intervention analysis: it is used to analyze matched case-control data in an observational sense as well.

— Alexis

Given the description / setup prior to the table for Q3, I suspect the "X" is a typo, but that was a good catch & this is a useful contribution to the thread +1.

— gung - Reinstate Monica

@mso Edited Q3. it is X at t1! otherwise, as you say it not different from Q1. this Q is over a year old and surprised to see someone coming back to it with the same thoughts that confused me. Following with much interest!

— Anto

My apologies, the OP has clarified Q3, evidently it is 2 different diseases at 2 different times. Again, good catch.

— gung - Reinstate Monica