I have downloaded an extensive dataset from NIH GEO and am attempting to convert the Ensembl names in the first column to MGI symbols
The table I've named SOD is shown below
SOD Data - Total rows = 15,396
I used the following code:
setwd("C:/R/Project")
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("biomaRt", version = "3.8")
library(BiocManager)
library(biomaRt)
SOD<-read.csv("Static Organoid Data.csv")
names_only<-data.frame(SOD[,1])
mart <- useMart(biomart = "ensembl", dataset = "mmusculus_gene_ensembl")
Gene_list <- getBM(attributes = c("ensembl_gene_id", "mgi_symbol"),
values = names_only,
mart = mart)
View(Gene_list)
This outputs a list of ensembl and MGI symbols with over 55,000 rows.
I have tried adding filter = "ensembl_gene_id
into the getBM
function but the output has 0 rows and 0 columns.
What am I doing wrong here?
Your ensembl IDs are versioned, meaning that they are of the form they have a
.#
whereas the ensembl ids in biomart aren't. To fix this you need to remove the.#
at the end of the names as follows: