AGG {hacks}R Documentation

Compute summary statistics of data with many subsets

Description

Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form. This function is based on the aggregate function in R, but uses a single dimension for the grouping variable instead of multiple dimensions.

Usage

AGG(x, by, FUN, ...)

Arguments

x a data frame.
by a list of grouping elements, each as long as the variables in x.
FUN a scalar function to compute the summary statistics which can be applied to all data subsets.
... further arguments passed to or used by methods.

Details

Unlike aggregate, the AGG function requires that x be a data frames. If x is not a data frame, it is coerced to one. Next, combinations of the components of by are pasted together to form a single vector of labels. Then, each of the columns in x is split into subsets, and FUN is applied to each such subset with further arguments in ... passed to it.

The reason we do this is that the original procedure of handling the problem in aggregate becomes extremely slow, when the number of unique combinations in by is small, but the number of possible combinations is large. Empty subsets are removed, and the result is reformatted into a data frame containing the variables in by and x. Those arising from by contain the unique combinations of grouping values used for determining the subsets, and the ones arising from x the corresponding summary statistics for the subset of the respective variables in x. Rows with missing values in any of the by variables will be omitted from the result.

Value

A data frame with columns corresponding to the grouping variables in by followed by aggregated columns from x. If the by has names, the non-empty times are used to label the columns in the results, with unnamed grouping variables being named Group.i for by[[i]].

Author(s)

Vicky Yang

See Also

aggregate, apply, lapply, tapply.

Examples

x<-1:10
by<-c(rep("a",2),rep("b",4),rep("c",4))
AGG(x,list(by),sum)

by<-CO2[,1:3]
x<-CO2[,4:5]
aggregate(x,by,sum)
AGG(x,by,sum)

AGG(state.x77, list(Region = state.region), mean)

[Package hacks version 0.1-5 Index]