how to replace the "N" in the Same Row if any of the columns is empty in R programming

147 Views Asked by At

How to replace the char "N" from the column "GID" in the same Row if any of the columns is empty

DataFile <- extract_tables("new.pdf",pages = c(87),
                           method = "stream", output = "data.frame", guess = TRUE)
DataFrame<-as.data.frame(DataFile)

#removing No. and A# from columns
df2<-subset(DataFrame, Group!="No." & Group!="A#") 

output:

GID    ColA    ColB 
1       2       2
2       3       4
3       5       4
4       6       5
5       6       5
NG1     8 
MG2     8       1
MG3     8       1
NG4     8 

Expected output:

GID    ColA    ColB 
1       2       2
2       3       4
3       5       4
4       6       5
5       6       5
G1     8       N
MG2     8       1
MG3     8       1
G4     8       N

DATA:

df1 <-  structure(list(GID = c("1", "2", "3", "4", "5", "NG1", "MG2", 
"MG3", "NG4"), ColA = c(2L, 3L, 5L, 6L, 6L, 8L, 8L, 8L, 8L), 
    ColB = c("2", "4", "4", "5", "5", "", "1", "1", "")), row.names = c(NA, 
-9L), class = "data.frame")
2

There are 2 best solutions below

2
On

By this way you can replace empty char with N or any other character if your choice without mentioning column name

library(tidyverse)

df1 <- structure(list(GID = c("1", "2", "3", "4", "5", "NG1", "MG2", "MG3", "NG4"), ColA = c(2L, 3L, 5L, 6L, 6L, 8L, 8L, 8L, 8L), ColB = c("2", "4", "4", "5", "5", "", "1", "1", "")), row.names = c(NA, -9L), class = "data.frame")

df1 %>% 
  mutate(across(everything(), ~str_replace(., "^$", "N")),
         GID = GID %>% str_remove("N"))
#>   GID ColA ColB
#> 1   1    2    2
#> 2   2    3    4
#> 3   3    5    4
#> 4   4    6    5
#> 5   5    6    5
#> 6  G1    8    N
#> 7 MG2    8    1
#> 8 MG3    8    1
#> 9  G4    8    N

Created on 2021-02-05 by the reprex package (v0.3.0)

4
On

In base R, you could try the following.

First, identify the rows where ColB is an empty character value, and store in a logical vector:

emp_rows <- df1$ColB == ""

Then, remove "N" in GID in those rows:

df1$GID[emp_rows] <- gsub("N", "", df1$GID[emp_rows])

And store "N" in ColB in the same rows:

df1$ColB[emp_rows] <- "N"

To generalize for any column that is blank, you can do the following. Based on the logic in the comment, first check if GID starts with "N". If it does, remove the "N", and then check all columns for blank values, and if blank, substitute with "N".

You can create a function to do this, and then use apply or other method to rowwise go through your data frame.

my_fun <- function(vec) {
  if (startsWith(vec[["GID"]], "N")) {
    vec[["GID"]] <- gsub("N", "", vec[["GID"]])
    vec <- replace(vec, vec == "", "N")
  }
  return(vec)
}

data.frame(t(apply(df1, 1, my_fun)))

Output

  GID ColA ColB
1   1    2    2
2   2    3    4
3   3    5    4
4   4    6    5
5   5    6    5
6  G1    8    N
7 MG2    8    1
8 MG3    8    1
9  G4    8    N