Say we have a data frame ("data") with observations of plant heights measured over time. The column names are: "individualId", "height", "species", "variety", "height". We could count the number of observations in each variety as:
aggregate( x = data[['height']], by = data[c('species','variety')], FUN = length )
The use of the 'length' function was non-obvious to me. Note the use of single brackets in specifying the columns of "data" in the "by"argument, and the lack of a comma in that expression. This is done because 'by' must be a list. Specifying data[,c('species','variety')] does not (always?) work.
We can of course calculate group means or variances in the same fashions, all of which becomes relevant when running statistical models.
No comments:
Post a Comment