Can anyone give a suggestion regarding when to use the map()
(all map_..() functions) and when to use summarise_at()
/mutate_at()
?
E.g. if we are doing some modification to the column of vectors then we do not need to think map()
?
If we have a df / have a column has a list in it then we need to use map()
?
Does map()
function always need to be used with nest()
function?
Anyone could suggest some learning videos regarding this. And also how to put lists in df and modeling multiple lists at the same time then store the model results in another column ?
Thank you so much!
The biggest difference between {dplyr} and {purrr} is that {dplyr} is designed to work on data.frames only, and {purrr} is designed to work on every kind of lists. Data.frames being lists, you can also use {purrr} for iterating on a data.frame.
summarise_at
andmap_at
do not exactly behave the same:summarise_at
just return the summary you're looking for,map_at
return all the data.frame as a list, with the modification done where you asked it :map_at
always return a list,mutate_at
always a data.frame :So to sum up on your first question, if you are thinking about doing operation "column-wise" on a non-nested df and want to have a data.frame as a result, you should go for {dplyr}.
Regarding nested column, you have to combine
group_by()
,nest()
from {tidyr},mutate()
andmap()
. What you're doing here is creating a smaller version of your dataframe that will contain a column which is a list of data.frames. Then, you're going to usemap()
to iterate over the elements inside this new column.Here is an example with our beloved iris:
Here, the new object is a data.frame with the colum
data
being a list of smaller data.frames, one by Species (the factor we specified ingroup_by()
). Then, we can iterate on this column by simply doing :But the idea is to keep everything inside a data.frame, so we can use
mutate
to create a column that will keep this new list oflm
results:So you can run several
mutate()
to get ther.squared
for e.g:But a more efficient way is to use
compose()
from {purrr} to build a function that will do it once, instead of repeating themutate()
.If you know you'll always be using
Sepal.Length ~ Sepal.Width
, you can even prefilllm()
withpartial()
:Regarding the resources, I've written a series of blogpost on {purrr} you can check: https://colinfay.me/tags/#purrr