Testing Symmetry on Contingency Tables from Paired Measurements: McNemar's Test
Paired categorical data from two classifiers
A typical example for paired categorical measurements arises when one wants to identify whether two classifiers yield similar predictions for identical sets of observations. In this case, the predictions relate to the same measurements and are therefore paired. Assume there are two class labels, 0 and 1.
y_hat1 <- c(rep(0, 10), rep(1,5), rep(0,5))
y_hat2 <- c(rep(0, 7), rep(1,3), rep(rep(1,5)), rep(1,5))
df <- data.frame(Y_Hat1 = y_hat1, Y_Hat2 = y_hat2)
print(df)
## Y_Hat1 Y_Hat2
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
## 7 0 0
## 8 0 1
## 9 0 1
## 10 0 1
## 11 1 1
## 12 1 1
## 13 1 1
## 14 1 1
## 15 1 1
## 16 0 1
## 17 0 1
## 18 0 1
## 19 0 1
## 20 0 1
Construction of contingency table
To construct the contingency table, we have to find the number of agreements and disagreements between the classifiers. There are four possibilities for this:
- Both classifiers output class 0 (0/0)
- Classifier 1 outputs class 0 and classifier 2 outputs class 1 (0/1)
- Classifier 1 outputs class 1 and classifier 2 outputs class 0 (1/0)
- Both classifiers output class 1 (1/1)
tab <- xtabs(data = df)
print(tab)
## Y_Hat2
## Y_Hat1 0 1
## 0 7 8
## 1 0 5
The contingency table shows that there is a deviation between the classifiers. When the first classifier predicts class 0, the second classifier often predicts class 1 (8 times).
McNemar’s test
Since McNemar’s test assumes marginal homogeneity, it is concerned only with differences between those dichotomous outcomes where there is a disagreement. For our classifier example, this means that the test considers only the frequencies in the cells were they don’t agree (0/1 and 1/0).
To formalize this, assume a contingency table of the following form:
Second Classifier: 0 | Second classifier: 1 | Marginal | |
---|---|---|---|
First Classifier: 0 | a | b | a + b |
First classifier: 1 | c | d | c + d |
Marginal | a + c | b + d |
Further, let pa, pb, pc, and pc indicate the probabilities for the individual cells. The assumption of marginal homogeneity means that pa+pc=pa+pb and pb+pd=pc+pd. Thus, pa and pd don’t provide any information and the null hypothesis is pb=pc, while the alternative is pb≠pc.
The test statistic is
χ2=(b−c)2b+c.
Since the test statistic has a χ2 distribution with 1 degree of freedom, McNemar’s test should only be applied if b+c is sufficiently large (e.g. b+c>25). Otherwise, an exact version of McNemar’s test should be considered.
Performing McNemar’s test in R
McNemar’s test can be performed by providing the contingency table as an argument to mcnemar.test
:
mc.result <- mcnemar.test(tab)
print(mc.result$p.value)
## [1] 0.01332833
Here, the p-value indicates a significant result at the 5% level. Thus, we reject the null hypothesis and can conclude that the two classifiers make considerable different predictions.
Comments
There aren't any comments yet. Be the first to comment!