Entrez gene IDs from gene list using biomaRt

6.2k Views Asked by At

I am trying to convert a list of gene names to entrez gene IDs.

for now i have this:

>library(biomaRt)    
>ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
>mapping <- getBM(attributes=c('ensembl_gene_id','ensembl_transcript_id',
                          'entrezgene', 'hgnc_symbol'),mart = ensembl)

This creates a table with the entrez gene IDs and names. However how can I filter out the IDs based on my gene list?

This is an example of the gene names list: Gene names

It is just an excel files with couple of hundred gene names in total.

Hopefully someone could help me!

1

There are 1 best solutions below

5
On BEST ANSWER

Data

Create a vector of gene names:

mygenes <- c("TNF", "IL6", "IL1B", "IL10", "CRP", "TGFB1", "CXCL8")

Retrieve information from the BioMart:

library(biomaRt)

hsmart <- useMart(dataset = "hsapiens_gene_ensembl", biomart = "ensembl")

hsmart

# Object of class 'Mart':
#   Using the ENSEMBL_MART_ENSEMBL BioMart database
#   Using the hsapiens_gene_ensembl dataset

Map gene names to Ensembl gene ids, transcript ids, entreze ids

To do this, you don't need to convert whole database into the table of corresponding ids. Using filter = "hgns_symbol" as parameter for your getBM() call, will subset database by gene names you've provided as a values argument of getBM() function:

mapping <- getBM(
  attributes = c('ensembl_gene_id', 'ensembl_transcript_id', 'entrezgene', 'hgnc_symbol'), 
  filters = 'hgnc_symbol',
  values = mygenes,
  mart = hsmart
)

Which give you 43 records for your genes:

mapping %>%
  arrange(hgnc_symbol, ensembl_gene_id, ensembl_transcript_id, entrezgene)

#   ensembl_gene_id ensembl_transcript_id entrezgene hgnc_symbol
#1  ENSG00000132693       ENST00000255030       1401         CRP
#2  ENSG00000132693       ENST00000368110       1401         CRP
#3  ENSG00000132693       ENST00000368111       1401         CRP
#4  ENSG00000132693       ENST00000368112       1401         CRP
#5  ENSG00000132693       ENST00000437342       1401         CRP
#
#   ............................................................
#
#39 ENSG00000228321       ENST00000412275       7124         TNF
#40 ENSG00000228849       ENST00000420425       7124         TNF
#41 ENSG00000228978       ENST00000445232       7124         TNF
#42 ENSG00000230108       ENST00000443707       7124         TNF
#43 ENSG00000232810       ENST00000449264       7124         TNF