Say I have three datasets, each is a list of differentially expressed genes. How can I use R to find the genes that repeat in all three sets?
An example of the dataset would be:
(there would be hundreds of genes in each set)
Dataset 1:
KRAS
MAPK1
CYCS
ABCD
ABCG1
TMEM51
Dataset 2: CYCS GAGE12J TMEM51 ABCG1 MAPK1
Dataset 3: KRAS ABCG1 TMEM51 ALB RGS13 CYCS
The output I would get for this sample would be ABCG1, CYCS, and TMEM51, because those are the only ones that show up in all three steps.
I tried using the dplyr package, `
# Function to extract gene symbols from CSV file
extract_genes <- function(file_path) {
df <- read.csv(file_path, header = TRUE) # Read CSV file
genes <- df$GeneSymbol # Extract gene symbols column
return(genes)
}
# File paths for your datasets
file_paths <- c(" Significance 1.csv",
"Significance 2.csv",
"Significance 3.csv",
"Significance 4.csv")
# List to store gene symbols from each dataset
gene_lists <- list()
# Extract gene symbols from each dataset
for (file_path in file_paths) {
gene_lists[[file_path]] <- extract_genes(file_path)
}
# Find common genes across all datasets
common_genes <- Reduce(intersect, gene_lists)
# Print common genes
print(common_genes)`
I got this response: NULL
However, I know that there are genes that are present in all datasets so this result must be wrong.