SSI {MiscPsycho} | R Documentation |
The similar student index uses a K nearest neighbor algorithm to generate a set of conditional norms for the outcome variable. The conditional norm is constructed on the basis of the K students in the data most like student i
who are used as the comparison set
SSI(...) ## Default S3 method: SSI(mf, y, k, ...) ## S3 method for class 'formula': SSI(formula, data, id, k, na.action, subset, ...)
formula |
a formula of the form lhs ~ rhs where lhs
is a numeric variable giving the data values and rhs also numeric variables giving the
conditioning variables used to identify the nearest neighbor. |
data |
an optional data frame, list or environment (or object
coercible by as.data.frame to a data frame) containing
the variables in the model. If not found in data , the
variables are taken from environment(formula) ,
typically the environment from which SSI is called. |
na.action |
a function which indicates what should happen when the data
contain NA s. Defaults to getOption("na.action") . |
subset |
an optional vector specifying a subset of observations to be used. |
id |
the individual (student) id identifying records in the data |
k |
the number of nearest neighbors to choose. k cannot be larger than the total number of pairwise comparisons in the data. |
mf |
a model frame with the variables used for conditioning. Only implemented for the default method. |
y |
the numeric outcome variable. Only implemented for the default method |
... |
Not implemented |
Implementation of the K nearest neighbor method is based on the euclidean distance metric. Because the process identifies the k nearest neighbors for each record in the data, the process can be relatively slow, executing in O(n^2logn)
A list with class "SSI"
containing the following components:
Zscore |
the conditional z -score for each record in the data) |
percentile |
the conditional percentile for each reocrd in the data |
ID |
the individual's record id |
Iterations |
the number of Newton-Raphson iterations used |
model.frame |
the data matrix used for estimating the conditional norms. This data frame can differ from the
original data depending on the use of na.action . |
Harold Doran
## Generate sample data ## construct a norm for the math score based on the k = 20 ## other individuals in the data most like student i. ## readScore and scienceScore are used as the conditioning variables ## to compute the euclidean norm. set.seed(1234) tmp <- data.frame(ID = 1:100, mathScore = rnorm(100), readScore = rnorm(100), scienceScore = rnorm(100)) (result <- SSI(mathScore ~ readScore + scienceScore, tmp, k = 20, id=ID, na.action = na.omit)) summary(result) str(result) head(result$model.frame)