Unnesting lists inside a column using unnest_longer()

57 Views Asked by At

I have this code:

library(jsonlite)
library(tidyverse)
    datasets.raw <- fromJSON(query, flatten = TRUE, simplifyDataFrame = TRUE)
datasets.raw$fields.species
datasets.df <- do.call(rbind, datasets.raw) #convert to dataframe

alternatively, the data could be obtained like this:

datasets.df <- structure(list(id = c("221", "PXD011681", "PXD013748", "PXD017277", 
                              "PXD013449", "PXD017613"),
                              source = c("221", "pride", "pride", "pride", "pride", "pride"), 
                              fields.species = list(221L, "Homo Sapiens (human)", c("Passer Hispaniolensis", "Passer Domesticus Domesticus"                                                                                                                       ), "Bifidobacterium Longum Subsp. Longum", "Homo Sapiens (human)", 
                                                                                                                      "Homo Sapiens (human)")), 
                              row.names = c("hitCount", "entries.1", 
                                                                                                                                                              "entries.2", "entries.3", "entries.4", "entries.5"), class = "data.frame")

I would like to unpack the lists inside the columns, such as the one in datasets.df$fields.species, and convert them into individual rows. FWIK, unnest_longer() should be ideal for this. I've tried:

unpack <- datasets.df %>% unnest_longer(fields.species)

but this gives me an error:

! Can't combine `..1$fields.species` <integer> and `..3$fields.species` <character>.

Any idea why this does not work?

2

There are 2 best solutions below

2
Nir Graham On BEST ANSWER
(unpack <- datasets.df %>%
  rowwise() |>
  mutate(
    fields.species =
      list(as.character(fields.species))) |>
  unnest_longer(fields.species))
0
r2evans On

I think you can go straight to datasets.raw$species (the other top-level is hitCount, a single int indicating the number of rows in the rest) and simplify the process:

library(dplyr)
library(tidyr)
bind_rows(datasets.raw$entries) %>%
  unnest(fields.species)
# # A tibble: 248 × 3
#    id        source fields.species                      
#    <chr>     <chr>  <chr>                               
#  1 PXD011681 pride  Homo Sapiens (human)                
#  2 PXD013748 pride  Passer Hispaniolensis               
#  3 PXD013748 pride  Passer Domesticus Domesticus        
#  4 PXD017277 pride  Bifidobacterium Longum Subsp. Longum
#  5 PXD013449 pride  Homo Sapiens (human)                
#  6 PXD017613 pride  Homo Sapiens (human)                
#  7 PXD023233 pride  Unclassified Azonexus               
#  8 PXD026942 pride  Homo Sapiens (human)                
#  9 PXD022030 pride  Prokaryotic Environmental Samples   
# 10 PXD031919 pride  Homo Sapiens (human)                
# # ℹ 238 more rows
# # ℹ Use `print(n = ...)` to see more rows