Creating Venn diagram from two columns in a Tibble

407 Views Asked by At

I have a tibble with a column containing sample names ("sample") and a column containing gene names ("gene"). Each sample contains multiple genes, and each row shows a simple gene, so each sample spans a lot of rows.

I want to create a list which can be used for the ggvenn package. But so far, I have only managed to create a list where each sample shows up in multiple rows and each row only contains one gene. I would like to have one row per sample, where all the gene names are combined, and then use the Venn diagram to show how many genes are overlapping in each sample.

Can anyone help? Would be greatly appreciated! Best regards, Rasmus

1

There are 1 best solutions below

0
On

Here is a solution that creates a wide format data frame that is also accepted by ggvenn.

require("ggvenn")
require("tidyverse")

# Generate data
set.seed(1)
data <- tibble(sample = sample(c("A", "B", "C"), 1000, replace = T),
               gene = as.character(sample(1:500, 1000, replace = T)))

# Grab the unique rows, and convert to a column per sample that stores
# whether a gene is present or not as a boolean
sets <- data %>%
  unique() %>%
  mutate(value = TRUE) %>%
  pivot_wider(id_cols = gene,
              names_from = sample,
              values_from = value,
              values_fill = FALSE)
head(sets)
#> # A tibble: 6 × 4
#>   gene  A     C     B    
#>   <chr> <lgl> <lgl> <lgl>
#> 1 165   TRUE  TRUE  FALSE
#> 2 416   TRUE  TRUE  TRUE 
#> 3 488   TRUE  FALSE TRUE 
#> 4 138   FALSE TRUE  TRUE 
#> 5 45    TRUE  TRUE  FALSE
#> 6 151   FALSE TRUE  FALSE

# Plot
ggvenn(sets, c("A", "B", "C"))       

Created on 2022-09-21 by the reprex package (v2.0.1)