I have two data frames. dfOne
is made like this:
X Y Z T J
3 4 5 6 1
1 2 3 4 1
5 1 2 5 1
and dfTwo
is made like this
C.1 C.2
X Z
Y T
I want to obtain a new dataframe where there are simultaneously X
, Y
, Z
, T
Values which are major than a specific threshold.
Example. I need simultaneously (in the same row):
X, Y > 2
Z, T > 4
I need to use the second data frame to reach my objective, I expect something like:
dfTwo$C.1>2
so the result would be a new dataframe with this structure:
X Y Z T J
3 4 5 6 1
How could I do it?
We can use the
purrr
packageHere is the input data.
Here is the implementation
map2_dfc
loop through each column indat
and each value invals
one by one with a defined function.~ifelse(.x > .y | is.na(.y), .x, NA)
means if the number in each column is larger than the corresponding value invals
, orvals
isNA
, the output should be the original value from the column. Otherwise, the value is replaced to beNA
. The output ofmap2_dfc(dat, vals, ~ifelse(.x > .y | is.na(.y), .x, NA))
is a data frame withNA
values in some rows indicating that the condition is not met. Finally,na.omit
removes those rows.Update
Here I demonstrate how to covert the
dfTwo
dataframe to thevals
vector in my example.First, let's create the
dfTwo
data frame.To complete the task, I load the
dplyr
andtidyr
package.Now I begin the transformation of
dfTwo
. The first step is to usestack
function to convert the format.The second step is to add the threshold information. One way to do this is to create a look-up table showing the association between
Group
andValue
And then we can use the
left_join
function to combine the data frame.Now it is the third step. Notice that there is a column called
J
which does not need any threshold. So we need to add this information todfTwo3
. We can use thecomplete
function fromtidyr
. The following code completes the data frame by addingCol
indat
but not indfTwo3
andNA
to the Value.The fourth step is arrange the right order of
dfTwo4
. We can achieve this by turningCol
to factor and assign the level based on the order of the column name indat
.We are almost there. Now we can create
vals
fromdfTwo5
.Now we are ready to use the
purrr
package to filter the data.The aboved are the breakdown of steps. We can combine all these steps into the following code for simlicity.