AGG {hacks} | R Documentation |
Splits the data into subsets, computes summary statistics for each,
and returns the result in a convenient form. This function is based on
the aggregate
function in R, but uses a single dimension for the
grouping variable instead of multiple dimensions.
AGG(x, by, FUN, ...)
x |
a data frame. |
by |
a list of grouping elements, each as long as the variables
in x . |
FUN |
a scalar function to compute the summary statistics which can be applied to all data subsets. |
... |
further arguments passed to or used by methods. |
Unlike aggregate
, the AGG
function requires that x
be a data frames. If x
is not a data frame, it is coerced to one.
Next, combinations of the components of by
are pasted together to
form a single vector of labels. Then, each of the columns in x
is
split into subsets, and FUN
is applied to each such subset with
further arguments in ...
passed to it.
The reason we do this is that the original procedure of handling the problem
in aggregate
becomes extremely slow, when the number of unique
combinations in by
is small, but the number of possible combinations
is large. Empty subsets are removed, and the result is reformatted into a data
frame containing the variables in by
and x
. Those arising
from by
contain the unique combinations of grouping values used for
determining the subsets, and the ones arising from x
the corresponding
summary statistics for the subset of the respective variables in x
.
Rows with missing values in any of the by
variables will be omitted
from the result.
A data frame with columns corresponding to the grouping variables in
by
followed by aggregated columns from x
. If the by
has names, the non-empty times are used to label the columns in the
results, with unnamed grouping variables being named Group.i
for by[[i]]
.
Vicky Yang
aggregate
, apply
, lapply
, tapply
.
x<-1:10 by<-c(rep("a",2),rep("b",4),rep("c",4)) AGG(x,list(by),sum) by<-CO2[,1:3] x<-CO2[,4:5] aggregate(x,by,sum) AGG(x,by,sum) AGG(state.x77, list(Region = state.region), mean)