dummy {dummies} | R Documentation |
This package flexibly and efficiently creates dummy variables for a variety of structures.
dummy(x, data = NULL, sep = "", drop = TRUE, fun = as.integer, verbose = FALSE) dummy.data.frame(data, names = NULL, omit.constants=TRUE, dummy.classes = getOption("dummy.classes"), all = TRUE, ...)
x |
a single variable or variable _name_ |
data |
an object such as a data.frame or matrix that has colnames |
drop |
Whether to drop (i.e. omit) dummy variables for unused levels.
When x or data[,x] is a factor, this parameter
variables for only the used levels. By default, dummies are created only
for the used levels, i.e. TRUE.
|
sep |
For the names of the created dummy variables, sep is the character used between the variable name and the value. |
fun |
Function used to coerce values in the resulting matrix or frame. |
verbose |
logical. Whether to print(cat) the number of dummy variables created Default: FALSE |
names |
The names of the columns to expand to dummy variables. Takes
precedent over dummy.classes parameter.
|
dummy.classes |
( For dummy.data.frame only ) A vector of classes names for
which dummy variables are created -or- "ALL" to create dummy variables
for all columns irregardless of type.
By default, dummy variables are produced for factor and character
class and be modified globally by options('dummy.classes' ).
|
omit.constants |
Whether to omit dummy variables that are constants, i.e. contain only one
value. Overridden by drop==FALSE .
|
all |
( For dummy.data.frame only ). Whether to return columns
that are not dummy classes. The default is TRUE and returns all
classes. Non dummy classes are untouched.
|
... |
arguments passed to dummy |
dummy
take a single variable OR the name of single variable
and a data frame. It coerces the variable to a factor and
returns a matrix of dummy variables using model.matrix
.
If the data
has rownames, these are retained.
Optionally, the parameter drop
indicates that that dummy variables
will be created for only the expressed levels of factors. Setting it
to false will produce dummy variables for all levels of all factors.
If there is only one level for the variable and verbose == TRUE
, a
warning is issued before creating the dummy variable. Each element of this
dummy variable, will have the same value.
A seperator, sep
, can be specified for the seperator between
the variable name and the value for the construction of new variable
names. The default is to provide no seperator.
The type of values returned can be affected using the fun
argument. fun
is called on each of the resultant dummy
variables. The only useful functions that the author has employed
are as.interger
(the default) or as.logical
.
dummy.data.frame
takes a data.frame or matrix and returns
a data.frame in which all specified columns are expanded as
dummy variables. Specific columns can be named with the names
argument or specified on a class basis by the dummy.classes
argument. Specified names take precedent over classes. The default
is to expand dummy variables for character and factor classes, and
can be controlled globally by options('dummy.classes')
.
If the argument all
is FALSE. The resulting data.frame
will contain only the new dummy variables. By default, all columns
of the object are returned in the order of the original frame.
Dummy variables are expanded in place.
omit.constants
indicates whether to omit dummy variables that
assume only a single value. This is the default. If drop==FALSE
,
constant variables are retained regardless of the setting.
dummy
returns a matrix with the number of rows equal
to the that of given variable. By default, the matrix contains
integers, but the exact type can be affected by fun
argument.
Rownames are retained if the supplied variable has associate row names.
dummy.data.frame
returns a data.frame in which variables are
expanded to dummy variables if they are one of the dummy classes.
The columns are return in the same order as the input with dummy
variable columns replacing the original column.
Christopher Brown
http://wiki.r-project.org/rwiki/doku.php?id=tips:data-manip:create_indicator
http://tolstoy.newcastle.edu.au/R/help/00b/1199.html
http://tolstoy.newcastle.edu.au/R/help/03a/6409.html
http://tolstoy.newcastle.edu.au/R/help/01c/0580.html
Many other discussions on R-Help. Too many to list.
model.frame
,
model.matrix
,
factor
letters <- c( "a", "a", "b", "c", "d", "e", "f", "g", "h", "b", "b" ) dummy( as.character(letters) ) dummy( letters[1:6] ) l <- as.factor(letters)[ c(1:3,1:6,4:6) ] dummy(l) dummy(l, drop=FALSE) dummy(l, sep=":") dummy(l, sep="::", fun=as.logical) # TESTING NAS l <- c( NA, l, NA) dummy(l) dummy(l,sep=":") dummy(iris$Species) dummy(iris$Species[ c(1:3,51:53,101:103) ] ) dummy(iris$Species[ c(1:3,51:53,101:103) ], sep=":" ) dummy(iris$Species[ c(1:3,51:53) ], sep=":", drop=FALSE ) # TESTING TRAP FOR ONE LEVEL dummy( as.factor(letters)[c(1,1,1,1)] ) dummy( as.factor(letters)[c(1,1,2,2)] ) dummy( as.factor(letters)[c(1,1,1,1)] , drop = FALSE ) dummy.data.frame(iris) dummy.data.frame(iris, all=FALSE) dummy.data.frame(iris, dummy.class="numeric" ) dummy.data.frame(iris, dummy.class="ALL" )