Control Forest Hyper Parameters {party} | R Documentation |
Various parameters that control aspects of the `cforest' fit.
cforest_control(teststat = "max", testtype = "Teststatistic", mincriterion = qnorm(0.9), savesplitstats = FALSE, ntree = 500, mtry = 5, replace = TRUE, fraction = 0.632, ...)
teststat |
a character specifying the type of the test statistic to be applied. |
testtype |
a character specifying how to compute the distribution of the test statistic. |
mincriterion |
the value of the test statistic or 1 - p-value that must be exceeded in order to implement a split. |
mtry |
number of input variables randomly sampled as candidates
at each node for random forest like algorithms. The default
mtry = 0 means that no random selection takes place
and bagging is performed. |
savesplitstats |
a logical determining if the process of standardized two-sample statistics for split point estimate is saved for each primary split. |
ntree |
number of trees to grow in a forest. |
replace |
a logical indicating whether sampling of observations is done with or without replacement. |
fraction |
fraction of number of observations to draw without
replacement (only relevant if replace = TRUE ). |
... |
additional arguments to be passed to
ctree_control . |
The arguments teststat
, testtype
and mincriterion
determine how the global null hypothesis of independence between all input
variables and the response is tested (see ctree
). The
argument nresample
is the number of Monte-Carlo replications to be
used when testtype = "MonteCarlo"
.
A split is established when the sum of the weights in both daugther nodes
is larger than minsplit
, this avoids pathological splits at the
borders. When stump = TRUE
, a tree with at most two terminal nodes
is computed.
The argument mtry > 0
means that a random forest like `variable
selection', i.e., a random selection of mtry
input variables, is
performed in each node.
It might be informative to look at scatterplots of input variables against
the standardized two-sample split statistics, those are available when
savesplitstats = TRUE
. Each node is then associated with a vector
those length is determined by the number of observations in the learning
sample and thus much more memory is required.
An object of class ForestControl-class
.