Monday, November 23, 2009

R Idioms: Counting observations

In the context of a data frame holding records of observations, with columns indicating grouping variables and/or measurements, one often runs into the need to count the number of observations.  There is no easy function in R that I know of for this, but 'aggregate' serves well enough.

Say we have a data frame ("data") with observations of plant heights measured over time.  The column names are: "individualId", "height", "species", "variety", "height".  We could count the number of observations in each variety as:

aggregate( x = data[['height']], 
  by = data[c('species','variety')], 
  FUN = length )

The use of the 'length' function was non-obvious to me.  Note the use of single brackets in specifying the columns of "data" in the "by"argument, and the lack of a comma in that expression.  This is done because 'by' must be a list.  Specifying data[,c('species','variety')] does not (always?) work.

We can of course calculate group means or variances in the same fashions, all of which becomes relevant when running statistical models.

No comments:

Post a Comment