Replace inequalities in a dataframe with different types of elements in R

142 Views Asked by At

I have a dataframe that involves several columns, in which there are many instances where inequalities are present. What I would like to have, is an R script that will identify these inequalities and replace them with actual values. More specific, let's assume that we have "<2" and we want to replace it with its half value ("<2" -> 1.0). Is there a generic way to do it so that I do not need to find manually all the inequalities within the dataframe and replace them?

A simple example might be the following:

Col1,Col2, Col3, Col4 
3.4, RHO_1, <5, NA 
2,   RHO_2,  5, 1.3

And I want to get something like this:

Col1,Col2,Col3,Col4 
3.4, RHO_1, 2.5, NA 
2,   RHO_2,  5, 1.3

When all elements are numeric values (e.g. use numeric values instead of RHO_1, RHO_2 and NA), the following command is working:

df <-  lapply(df, function(x) sapply(sub("<", "0.5*", x, fixed = TRUE),
                                function(y) eval(parse(text = y))))

However, the above command does not work in the presence of NA and strings (e.g. RHO_1). I have tried to find the location of the value-only elements after converting all non-values into NA using the following command:

value_ind<- which(!is.na(as.matrix(df)), arr.ind = TRUE, useNames = TRUE) 

but I did not manage to use this information successfully. For your information the actual dataframe df consists of many rows and columns.

1

There are 1 best solutions below

0
On BEST ANSWER

I have managed to fix the issue. I have obtained a subset of the original dataframe (here named dataBase2) so that it does not include characters (e.g. exclude RHO_1,). The reduced dataframe is named dataBase6. Then, I have converted other symbols (e.g. "-","_" etc) to NA, and then applied the function. Below I am giving the code from the actual dataset:

# names of the columns that I want to remove (contain character)
out <- c("Code-Medsal","Number","Code_National","Projection","date","Notes") 
dataBase6 <- dataBase2[, !(colnames(dataBase2) %in% out) ] 
#replace special symbols with NA
dataBase6[dataBase6=="-"] <- NA
#apply the function to the numeric values + NA
dataBase6[] <-  lapply(dataBase6, function(x) sapply(sub("<", "0.55*", x, fixed = TRUE),
                                  function(y) eval(parse(text = y))))