createDataPartition {caret} | R Documentation |
A series of test/training partitions are created using
createDataPartition
while createResample
creates one or
more bootstrap samples. createFolds
splits the data into
k
groups.
createDataPartition(y, times = 1, p = 0.5, list = TRUE, groups = min(5, length(y))) createResample(y, times = 10, list = TRUE) createFolds(y, k = 10, list = TRUE, returnTrain = FALSE)
y |
a vector of outcomes |
times |
the number of partitions to create |
p |
the percentage of data that goes to training |
list |
logical - should the results be in a list (TRUE ) or a matrix
with the number of rows equal to floor(p * length(y)) and times
columns. |
groups |
for numeric y , the number of breaks in the quantiles
(see below) |
k |
an integer for the number of folds. |
returnTrain |
a logical. When true, the values returned are the
sample positions corresponding to the data used during
training. This argument only works in conjunction with list = TRUE |
For bootstrap samples, simple random sampling is used.
For other data splitting, the random sampling is done within the
levels of y
when y
is a factor in an attempt to balance
the class distributions within the splits. For numeric y
, the
sample is split into groups
sections based
on quantiles and sampling is done within these subgroups. Also, for
very small class sizes (<= 3) the classes may not show up in both the
training and test data
A list or matrix of row position integers corresponding to the training data
Max Kuhn
data(oil) createDataPartition(oilType, 2) x <- rgamma(50, 3, .5) inA <- createDataPartition(x, list = FALSE) plot(density(x[inA])) rug(x[inA]) points(density(x[-inA]), type = "l", col = 4) rug(x[-inA], col = 4) createResample(oilType, 2) createFolds(oilType, 10) createFolds(oilType, 5, FALSE) createFolds(rnorm(21))