How can I use R to find things that repeat between sets?

68 Views Asked by At

Say I have three datasets, each is a list of differentially expressed genes. How can I use R to find the genes that repeat in all three sets?

An example of the dataset would be: (there would be hundreds of genes in each set) Dataset 1:
KRAS MAPK1 CYCS ABCD ABCG1 TMEM51

Dataset 2: CYCS GAGE12J TMEM51 ABCG1 MAPK1

Dataset 3: KRAS ABCG1 TMEM51 ALB RGS13 CYCS

The output I would get for this sample would be ABCG1, CYCS, and TMEM51, because those are the only ones that show up in all three steps.

I tried using the dplyr package, `

# Function to extract gene symbols from CSV file
extract_genes <- function(file_path) {
df <- read.csv(file_path, header = TRUE)  # Read CSV file
genes <- df$GeneSymbol  # Extract gene symbols column
return(genes)
}

# File paths for your datasets
file_paths <- c(" Significance 1.csv", 
            "Significance 2.csv", 
            "Significance 3.csv", 
            "Significance 4.csv")

# List to store gene symbols from each dataset
gene_lists <- list()

# Extract gene symbols from each dataset
for (file_path in file_paths) {
gene_lists[[file_path]] <- extract_genes(file_path)
}

# Find common genes across all datasets
common_genes <- Reduce(intersect, gene_lists)

# Print common genes
print(common_genes)`

I got this response: NULL

However, I know that there are genes that are present in all datasets so this result must be wrong.

0

There are 0 best solutions below