I have the following list of data frames and each data frame has 3 variables (a, b and c)

my.list <- list(d1, d2, d3, d4)

Inside my data frame, I have duplicated strings in "a" and I want to delete the rows with duplicated values

The current code i am using:

my.listnew <- lapply(my.list, function(x) unique(x["a"]))

The problem i have with this code is that the other 2 columns "b" and "c" are gone and I want to keep them, while the duplicated rows are deleted

3

There are 3 best solutions below

0
On BEST ANSWER

Use duplicated to remove the duplicated values in column a while keeping other columns.

my.listnew <- lapply(my.list, function(x) x[!duplicated(x$a), ])
4
On

Just for reference, tidyverse style of doing it-

set.seed(1)
my.list <- list(d1 = data.frame(a = sample(letters[1:3], 5, T),
                                b = rnorm(5),
                                c = rnorm(5)), 
                d2 = data.frame(a = sample(letters[1:3], 5, T),
                                b = rnorm(5),
                                c = rnorm(5)), 
                d3 = data.frame(a = sample(letters[1:3], 5, T),
                                b = rnorm(5),
                                c = rnorm(5)))
library(tidyverse)
map(my.list, ~ .x %>% filter(!duplicated(a)) )
#> $d1
#>   a         b          c
#> 1 a 1.5952808  0.5757814
#> 2 c 0.3295078 -0.3053884
#> 3 b 0.4874291  0.3898432
#> 
#> $d2
#>   a          b         c
#> 1 b  0.2522234 0.3773956
#> 2 a -0.8919211 0.1333364
#> 
#> $d3
#>   a          b          c
#> 1 a -0.2357066  1.1519118
#> 2 c -0.4333103 -0.4295131
#> 3 b -0.6494716  1.2383041

Created on 2021-05-13 by the reprex package (v2.0.0)

If you also want to combine the dataframes in output you may use map_dfr instead of map in above

0
On

We can use subset without any anonymous function

out <- lapply(my.list, subset, subset = !duplicated(a))

Or using data.table with unique

library(data.table)
out <- lapply(my.list, function(dat) unique(as.data.table(dat), by = 'a'))