I have downloaded an extensive dataset from NIH GEO and am attempting to convert the Ensembl names in the first column to MGI symbols

The table I've named SOD is shown below

SOD Data - Total rows = 15,396

I used the following code:

setwd("C:/R/Project")
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("biomaRt", version = "3.8")
library(BiocManager)
library(biomaRt)
SOD<-read.csv("Static Organoid Data.csv")
names_only<-data.frame(SOD[,1])
mart <- useMart(biomart = "ensembl", dataset = "mmusculus_gene_ensembl")
Gene_list <- getBM(attributes = c("ensembl_gene_id", "mgi_symbol"),
                   values     = names_only, 
                   mart       = mart)
View(Gene_list)

This outputs a list of ensembl and MGI symbols with over 55,000 rows.

I have tried adding filter = "ensembl_gene_id into the getBM function but the output has 0 rows and 0 columns.

What am I doing wrong here?

1

There are 1 best solutions below

2
On BEST ANSWER

Your ensembl IDs are versioned, meaning that they are of the form they have a .# whereas the ensembl ids in biomart aren't. To fix this you need to remove the .# at the end of the names as follows:

names_only <- gsub("\\.*","",data.frame(SOD[,1]))
mart <- useMart(biomart = "ensembl", dataset = "mmusculus_gene_ensembl")
Gene_list <- getBM(attributes = c("ensembl_gene_id", "mgi_symbol"),
                   values     = names_only,
                   filter     = "ensembl_gene_id",
                   mart       = mart)