confusionMatrix {caret} | R Documentation |
Calculates a cross-tabulation of observed and predicted classes with associated statistics.
confusionMatrix(data, ...) ## Default S3 method: confusionMatrix(data, reference, positive = NULL, dnn = c("Prediction", "Reference"), prevalence = NULL, ...) ## S3 method for class 'table': confusionMatrix(data, positive = NULL, prevalence = NULL, ...)
data |
a factor of predicted classes (for the default method) or an object of class table . |
reference |
a factor of classes to be used as the true results |
positive |
an optional character string for the factor level that corresponds to a "positive" result (if that makes sense for your data). If there are only two factor levels, the first level will be used as the "positive" result. |
dnn |
a character vector of dimnames for the table |
prevalence |
a numeric value or matrix for the rate of the "positive" class of the data. When data has two levels, prevalence should be a single numeric value. Otherwise, it should be a vector of numeric values with elements for each class. The vector should have names corresponding to the classes. |
... |
options to be passed to table . NOTE: do not include dnn here |
The functions requires that the factors have exactly the same levels.
For two class problems, the sensitivity, specificity, positive
predictive value and negative predictive value is calculated using the
positive
argument. Also, the prevalence of the "event" is computed from the
data (unless passed in as an argument), the detection rate (the rate of true events also
predicted to be events) and the detection prevalence (the prevalence of predicted events).
Suppose a 2x2 table with notation
Reference | ||
Predicted | Event | No Event |
Event | A | B |
No Event | C | D |
The formulas used here are:
Sensitivity = A/(A+C)
Specificity = D/(B+D)
Prevalence = (A+C)/(A+B+C+D)
PPV = (sensitivity * Prevalence)/((sensitivity*Prevalence) + ((1-specificity)*(1-Prevalence)))
NPV = (specificity * (1-Prevalence))/(((1-sensitivity)*Prevalence) + ((specificity)*(1-Prevalence)))
Detection Rate = A/(A+B+C+D)
Detection Prevalence = (A+B)/(A+B+C+D)
See the references for discusions of the first five formulas.
For more than two classes, these results are calculated comparing each factor level to the remaining levels (i.e. a "one versus all" approach).
The overall accuracy and unweighted Kappa statistic are calculated.
The overall accuracy rate is computed along with a 95 percent confidence interval for this rate (using binom.test
) and a one-sided test to see if the accuracy is better than the "no information rate," which is taken to be the largest class percentage in the data.
a list with elements
table |
the results of table on data and reference |
positive |
the positive result level |
overall |
a numeric vector with overall accuracy and Kappa statistic values |
byClass |
the sensitivity, specificity, positive predictive value, negative predictive value, prevalence, dection rate and detection prevalence for each class. For two class systems, this is calculated once using the positive argument |
Max Kuhn
Kuhn, M. (2008), ``Building predictive models in R using the caret package, '' Journal of Statistical Software, (http://www.jstatsoft.org/v28/i05/).
Altman, D.G., Bland, J.M. (1994) ``Diagnostic tests 1: sensitivity and specificity,'' British Medical Journal, vol 308, 1552.
Altman, D.G., Bland, J.M. (1994) ``Diagnostic tests 2: predictive values,'' British Medical Journal, vol 309, 102.
as.table.confusionMatrix
, as.matrix.confusionMatrix
,
sensitivity
, specificity
, posPredValue
, negPredValue
,
print.confusionMatrix
, binom.test
################### ## 2 class example lvs <- c("normal", "abnormal") truth <- factor(rep(lvs, times = c(86, 258)), levels = rev(lvs)) pred <- factor( c( rep(lvs, times = c(54, 32)), rep(lvs, times = c(27, 231))), levels = rev(lvs)) xtab <- table(pred, truth) confusionMatrix(xtab) confusionMatrix(pred, truth) confusionMatrix(xtab, prevalence = 0.25) ################### ## 3 class example library(MASS) fit <- lda(Species ~ ., data = iris) model <- predict(fit)$class irisTabs <- table(model, iris$Species) confusionMatrix(irisTabs) confusionMatrix(model, iris$Species) newPrior <- c(.05, .8, .15) names(newPrior) <- levels(iris$Species) confusionMatrix(irisTabs, prevalence = newPrior) ## Need names for prevalence ## Not run: confusionMatrix(irisTabs, prevalence = c(.05, .8, .15))