I originally had a vary wide data (4 rows with 158 columns) which I used reshape::melt() on to create a long data set (624 rows x 3 columns).
Now, however, I have a data set like this:
demo <- data.frame(region = as.factor(c("North", "South", "East", "West")),
criteria = as.factor(c("Writing_1_a", "Writing_2_a", "Writing_3_a", "Writing_4_a",
"Writing_1_b", "Writing_2_b", "Writing_3_b", "Writing_4_b")),
counts = as.integer(c(18, 27, 99, 42, 36, 144, 99, 9)))
Which produces a table similar to the one below:
region criteria counts
North Writing_1_a 18
South Writing_2_a 27
East Writing_3_a 99
West Writing_4_a 42
North Writing_1_b 36
South Writing_2_b 144
East Writing_3_b 99
West Writing_4_b 9
Now what I want to create is something like this:
goal <- data.frame(region = as.factor(c("North", "South", "East", "West")),
criteria = as.factor(c("Writing_1", "Writing_2", "Writing_3", "Writing_4")),
counts = as.integer(c(54, 171, 198, 51)))
Meaning that when I collapse the criteria columns it sums the counts:
region criteria counts
North Writing_1 54
South Writing_2 171
East Writing_3 198
West Writing_4 51
I have tried using forcats::fct_collapse and forcats::recode()but to no avail - I'm positive I'm just not doing it right. Thank you in advance for any assistance you can provide.
You can think about what exactly you're trying to do to change factor levels—
fct_collapsewould manually collapse several levels into one level, andfct_recodewould manually change the labels of individual levels. What you're trying to do is change all the labels based on applying some function, in which casefct_relabelis appropriate.You can write out an anonymous function when you call
fct_relabel, or just pass it the name of a function and that function's argument(s). In this case, you can usestringr::str_removeto find and remove a regex pattern, and regex such as_[a-z]$to remove any underscore and then lowercase letter that appear at the end of a string. That way it should scale well with your real data, but you can adjust it if not.Verifying that this new variable has only the levels you want:
And then summarizing based on that new factor:
Created on 2018-11-04 by the reprex package (v0.2.1)