Difficulty in downloading TCGA data

495 Views Asked by At

I am trying to download the TCGA data but I am getting this error:

Error in summarizeMaf(maf = maf, anno = clinicalData, chatty = verbose): Tumor_Sample_Barcode column not found in provided clinical data. Rename column containing sample names to Tumor_Sample_Barcode if necessary.

This is my code:

library("TCGAbiolinks")
library("tidyverse")
library(maftools)
query <- GDCquery(   project = "TCGA-LIHC",  
                     data.category = "Clinical", 
                     file.type = "xml", 
                     legacy = FALSE)
GDCdownload(query,directory = ".")

clinical <- GDCprepare_clinic(query, clinical.info = "patient",directory = ".")
#getting the survival time of event data
survival_data <- as_tibble(clinical[,c("days_to_last_followup","days_to_death","vital_status","bcr_patient_barcode","patient_id")]) 
survival_data <- filter(survival_data,!is.na(days_to_last_followup)|!is.na(days_to_death))  #not both NA
survival_data <- filter(survival_data,!is.na(days_to_last_followup)|days_to_last_followup>0 &is.na(days_to_death)|days_to_death > 0 ) #ensuring positive values
survival_data <- survival_data[!duplicated(survival_data$patient_id),]  #ensuring no duplicates


dim(survival_data) #should be 371


maf <- GDCquery_Maf("LIHC", pipelines = "muse")
#maf <- GDCquery_Maf("LIHC", pipelines = "somaticsniper")

#clin <- GDCquery_clinic("TCGA-LIHC","clinical")
#print(clin )



laml = read.maf(
  maf,
  clinicalData = clinical,
  removeDuplicatedVariants = TRUE,
  useAll = TRUE,
  gisticAllLesionsFile = NULL,
  gisticAmpGenesFile = NULL,
  gisticDelGenesFile = NULL,
  gisticScoresFile = NULL,
  cnLevel = "all",
  cnTable = NULL,
  isTCGA = TRUE,
  vc_nonSyn = NULL,
  verbose = TRUE
)
1

There are 1 best solutions below

1
IRTFM On

You should have: a) loaded with library(maftools) and b) included what was printed out before that error message:

-Validating
-Silent variants: 18306 
-Summarizing
--Possible FLAGS among top ten genes:
  TTN
  MUC16
  OBSCN
  FLG
-Processing clinical data
Available fields in provided annotations..
 [1] "bcr_patient_barcode"                              "additional_studies"                              
 [3] "tissue_source_site"                               "patient_id" 
# snipped remaining 78 column names      

Notice that the first column is not named "Tumor_Sample_Barcode", so you need to follow the helpful error message directions and rename the appropriate column which appears to be the first one:

ns. After doing so I get:

-Validating
-Silent variants: 18306 
-Summarizing
--Possible FLAGS among top ten genes:
  TTN
  MUC16
  OBSCN
  FLG
-Processing clinical data
-Finished in 1.911s elapsed (2.470s cpu)