How to remove empty spaces and omit the rows that has those empty spaces in R

174 Views Asked by At

enter image description here

So I tried to use df <- remove_empty(df, which=c("rows","cols"), cutoff = 1). It did not remove any of it even though there are empty slots in the data. Not sure what I did wrong or what I need to do to change, to make it remove the row entirely.

enter image description here

The results remained the same, as the blank spaces are still there and have yet to be omitted.

2

There are 2 best solutions below

0
jay.sf On BEST ANSWER

Consider this example data.frame, which has empty strings (and some NA's).

> dat
    V1   V2 V3 V4 V5
1    A    A  1  6 11
2    B    B  2  7 12
3            3  8 13
4            4  9 14
5 <NA> <NA>  5 10 15

First, you want to replace the empty strings with NA's and assign it back to dat[],

> dat[] <- lapply(dat, \(x) replace(x, x  %in% "", NA))
> dat
    V1   V2 V3 V4 V5
1    A    A  1  6 11
2    B    B  2  7 12
3 <NA> <NA>  3  8 13
4 <NA> <NA>  4  9 14
5 <NA> <NA>  5 10 15

then subset the rows withcomplete.cases.

> dat <- dat[complete.cases(dat), ]
> dat
  V1 V2 V3 V4 V5
1  A  A  1  6 11
2  B  B  2  7 12

In one step:

> dat |> lapply(\(x) replace(x, x  %in% "", NA)) |> data.frame() |> {\(.) .[complete.cases(.), ]}()
  V1 V2 V3 V4 V5
1  A  A  1  6 11
2  B  B  2  7 12

Data:

dat <- structure(list(V1 = c("A", "B", "", "", NA), V2 = c("A", "B", 
"", "", NA), V3 = c("1", "2", "3", "4", "5"), V4 = c("6", "7", 
"8", "9", "10"), V5 = c("11", "12", "13", "14", "15")), class = "data.frame", row.names = c(NA, 
-5L))
0
Chris Ruehlemann On

A dplyrsolution is this:

dat %>%
   # convert "" to `NA`:
   mutate(across(everything(), ~na_if(., ""))) %>%
   # remove any rows with `NA`:
   filter(!if_any(everything(), is.na))
  V1 V2 V3 V4 V5
1  A  A  1  6 11
2  B  B  2  7 12

Thanks to jay.sf for the toy data:

dat <- structure(list(V1 = c("A", "B", "", "", NA), V2 = c("A", "B", 
"", "", NA), V3 = c("1", "2", "3", "4", "5"), V4 = c("6", "7", 
"8", "9", "10"), V5 = c("11", "12", "13", "14", "15")), class = "data.frame", row.names = c(NA, 
-5L))