I have two data frames. dfOne is made like this:
X Y Z T J
3 4 5 6 1
1 2 3 4 1
5 1 2 5 1
and dfTwo is made like this
C.1 C.2
X Z
Y T
I want to obtain a new dataframe where there are simultaneously X, Y, Z, T Values which are major than a specific threshold.
Example. I need simultaneously (in the same row):
X, Y > 2Z, T > 4
I need to use the second data frame to reach my objective, I expect something like:
dfTwo$C.1>2
so the result would be a new dataframe with this structure:
X Y Z T J
3 4 5 6 1
How could I do it?
We can use the
purrrpackageHere is the input data.
Here is the implementation
map2_dfcloop through each column indatand each value invalsone by one with a defined function.~ifelse(.x > .y | is.na(.y), .x, NA)means if the number in each column is larger than the corresponding value invals, orvalsisNA, the output should be the original value from the column. Otherwise, the value is replaced to beNA. The output ofmap2_dfc(dat, vals, ~ifelse(.x > .y | is.na(.y), .x, NA))is a data frame withNAvalues in some rows indicating that the condition is not met. Finally,na.omitremoves those rows.Update
Here I demonstrate how to covert the
dfTwodataframe to thevalsvector in my example.First, let's create the
dfTwodata frame.To complete the task, I load the
dplyrandtidyrpackage.Now I begin the transformation of
dfTwo. The first step is to usestackfunction to convert the format.The second step is to add the threshold information. One way to do this is to create a look-up table showing the association between
GroupandValueAnd then we can use the
left_joinfunction to combine the data frame.Now it is the third step. Notice that there is a column called
Jwhich does not need any threshold. So we need to add this information todfTwo3. We can use thecompletefunction fromtidyr. The following code completes the data frame by addingColindatbut not indfTwo3andNAto the Value.The fourth step is arrange the right order of
dfTwo4. We can achieve this by turningColto factor and assign the level based on the order of the column name indat.We are almost there. Now we can create
valsfromdfTwo5.Now we are ready to use the
purrrpackage to filter the data.The aboved are the breakdown of steps. We can combine all these steps into the following code for simlicity.