mtsknn.eq {MTSKNN}R Documentation

A robust multivariate two-sample test based on k-nearest neighbors against unbalanceness

Description

The function tests whether two samples share the same underlying distribution based on k-nearest-neighbors approach. This approach is robust in the unbalanced case.

Usage

mtsknn.eq(x,y,k,clevel=0.05,getpval=TRUE, print=TRUE)

Arguments

x A matrix or data frame.
y A matrix or data frame.
k An integer.
clevel The confidence level. Default value is 0.05.
getpval Logic value. If it is set to be TRUE the p value of test will be calcuated and reported; if it is set to be false the p value will not be calculated.
print Logic value. If it is set to be TRUE the test result will be reported; if it is set to be false the test result will not be reported.

Details

matrices or data frames x and y are the two samples to be tested. Each row consists of the coordinates of a data point. The integer k is the number of nearest neighbors to choose in the testing procedure.

Value

The test result for a given confidence level. Reject or accept the null hypothesis. It can also calculate and report the p value.

Note

This is appropriate for the unbalanced case where the two sample sizes are about the same level. Another robust test ismtsknn.neq.

Author(s)

Lisha Chenlisha.chen@yale.edu, Peng Daipeng.dai@yale.edu and Wei Dou wei.dou@yale.edu

References

Schilling, M. F. (1986). Multivariate two-sample tests based on nearest neighbors. J. Amer. Statist. Assoc., 81 799-806.

Henze, N. (1988). A multivariate two-sample test based on the number of nearest neighbor type coincidences. Ann. Statist., 16 772-783.

Chen, L. and Dou W. (2009). Robust multivariate two-sample tests based on k nearest neighbors for unbalanced designs. manuscripts.

See Also

mtsknn, mtsknn.neq and mtsknn.discard

Examples


## Example of two samples from the same multivariate t distribution:

n <- 100

x <- matrix(rt(2*n, df=5),n,2)

y <- matrix(rt(2*10*n, df=5),(10*n),2)

mtsknn.eq(x,y,3)

## Example of two samples from different distributions:

n <- 100

x <- matrix(rt(2*n, df=10),n,2)

y <- matrix(rnorm(2*10*n),(10*n),2)

mtsknn.eq(x,y,3)


[Package MTSKNN version 0.0-5 Index]