mtsknn.discard {MTSKNN}R Documentation

A robust multivariate two-sample test based on k-nearest neighbors against unbalanceness by discarding extra data in the larger sample

Description

The function tests whether two samples share the same underlying distribution based on k-nearest-neighbors approach. This approach is robust in the unbalanced case by discarding extra data points in the larger sample.

Usage

mtsknn.discard(x,y,k)

Arguments

x A matrix or data frame.
y A matrix or data frame.
k An integer.

Details

matrices or data frames x and y are the two samples to be tested. Each row consists of the coordinates of a data point. The integer k is the number of nearest neighbors to choose in the testing procedure.

Value

The test result contains P value, Z score and test statistics.

Note

This is appropriate for the unbalanced case where the two sample sizes are about the same level. Another robust test ismtsknn.neq.

Author(s)

Lisha Chenlisha.chen@yale.edu, Peng Daipeng.dai@yale.edu and Wei Dou wei.dou@yale.edu

References

Schilling, M. F. (1986). Multivariate two-sample tests based on nearest neighbors. J. Amer. Statist. Assoc., 81 799-806.

Henze, N. (1988). A multivariate two-sample test based on the number of nearest neighbor type coincidences. Ann. Statist., 16 772-783.

Chen, L. and Dou W. (2009). Robust multivariate two-sample tests based on k nearest neighbors for unbalanced designs. manuscripts.

See Also

mtsknn, mtsknn.neq and mtsknn.eq

Examples


## Example of two samples from the same multivariate t distribution:

n <- 100

x <- matrix(rt(2*n, df=5),n,2)

y <- matrix(rt(2*10*n, df=5),(10*n),2)

mtsknn.discard(x,y,3)

## Example of two samples from different distributions:

n <- 100

x <- matrix(rt(2*n, df=10),n,2)

y <- matrix(rnorm(2*10*n),(10*n),2)

mtsknn.discard(x,y,3)


[Package MTSKNN version 0.0-5 Index]