plyr {plyr} | R Documentation |
The plyr package is a set of clean and consistent tools that implement the split-apply-combine pattern in R. This is an extremely common pattern in data analysis: you solve a complex problem by breaking it down into small pieces, doing something to each piece and then combining the results back together again.
The plyr functions are named according to what sort of data structure they split up and what sort of data structure they return:
array
list
data.frame
multiple inputs
repeat multiple times
nothing
So ddply
takes a data frame as input and
returns a data frame as output, and l_ply
takes a list as input and returns nothing as output.
By design, no plyr function will preserve row names - in
general it is too hard to know what should be done with
them for many of the operations supported by plyr. If you
want to preserve row names, use name_rows
to convert them into an explicit column in your data
frame, perform the plyr operations, and then use
name_rows
again to convert the column back
into row names.
Plyr also provides a set of helper functions for common data analysis problems:
arrange
: re-order the rows
of a data frame by specifying the columns to order by
mutate
: add new columns or modifying
existing columns, like transform
, but new
columns can refer to other columns that you just created.
summarise
: like mutate
but create a new data frame, not preserving any columns
in the old data frame.
join
: an adapation of
merge
which is more similar to SQL, and has
a much faster implementation if you only want to find the
first match.
match_df
: a version of
join
that instead of returning the two
tables combined together, only returns the rows in the
first table that match the second.
colwise
: make any function work
colwise on a dataframe
rename
: easily
rename columns in a data frame
round_any
: round a number to any degree of
precision
count
: quickly count unique
combinations and return return as a data frame.