whatis {YaleToolkit} | R Documentation |
Summarize the characteristics of variables (columns) in a data frame.
whatis(x, var.name.truncate = 20, type.truncate = 14)
x |
a data frame |
var.name.truncate |
maximum length (in characters) for truncation of variable names. The default is 20; anything less than 12 is less than the column label in the resulting data frame and is a waste of information. |
type.truncate |
maximum length (in characters) for truncation of variable type; 14 is the full width, but 4 works well if space is at a premium. |
The function whatis()
provides a basic examination of some
characteristics of each variable (column) in a data frame.
A list of characteristics describing the variables in the data frame, x
. Each component of the list has length(x)
values, one for each variable in the data frame x
.
variable.name |
from the names(x) attribute, possibly truncated to var.name.truncate characters in length. |
type |
the possibilities include "pure factor" , "mixed factor" , "ordered factor" , "character" , and "numeric" ; whatis() considers the possibility that a factor or a vector could contain character and/or numeric values. If both character and numeric values are present, and if the variable is a factor, then it is called a mixed factor. If the levels of a factor are purely character or numeric (but not both), it is a pure factor. Non-factors must then be either character or numeric. |
missing |
the number of NA s in the variable. |
distinct.values |
the number of distinct values in the variable, equal to length(table(variable)) . |
precision |
the number of decimal places of precision. |
min |
the minumum value (if numeric) or first value (alphabetically) as appropriate. |
max |
the maximum value (if numeric) or the last value (alphabetically) as appropriate. |
John W. Emerson, Walton Green
Special thanks to John Hartigan and the students of 'Statistical Case Studies'
of 2004 for their help troubleshooting and developing the function whatis()
.
See also str
.
mydf <- data.frame(a=rnorm(100), b=sample(c("Cat", "Dog"), 100, replace=TRUE), c=sample(c("Apple", "Orange", "8"), 100, replace=TRUE), d=sample(c("Blue", "Red"), 100, replace=TRUE)) mydf$d <- as.character(mydf$d) whatis(mydf) data(iris) whatis(iris) data(NewHavenResidential) whatis(NewHavenResidential)