How to convert fastq to uBAM with picard dock on google cloud

359 Views Asked by At

I have been trying to convert my fastq files on google cloud to uBAM files but no success so far. Here is code I used:

dsub \
--project projectID \
--zones "us-central1-*" \
--logging gs://bucket/logging \
--image broadinstitute/picard \
--command 'java -Xmx8G -jar picard.jar FastqToSam FASTQ=gs://bucket/S_1.fq FASTQ2=gs://bucket/S_2.fq OUTPUT=gs://bucket/S_fastqtosam.bam READ_GROUP_NAME=Cancers SAMPLE_NAME=TS LIBRARY_NAME=Solexa-272222 PLATFORM_UNIT=CL100056 PLATFORM=illumina SEQUENCING_CENTER=BI RUN_DATE=2017-08-20T00:00:00-0400'
--wait

I can see the image has been pulled and run successful, but then I got the error message saying the command is incorrect, and please check PicardcommandLine -h

Does anyone have experience in converting fastq to uBAM with google cloud? PLease help. Much appreciated. Thank you.

1

There are 1 best solutions below

0
On

The broadinstitute/picard docker image already runs java as its Entrypoint (see [1] for more details). You need to change the command to remove java -Xmx8G -jar picard.jar.

I'm also not sure whether passing cloud storage paths directly to picard would work and you likely need to specify additional arguments (--input and --output) when running dsub. See https://github.com/DataBiosphere/dsub/blob/master/docs/input_output.md for more details.

[1] The Entrypoint is defined as "/usr/picard/docker_helper.sh", which is this file that runs java directly. Each docker image has its own Entrypoint, so it's important to ensure you are using the right commands for each docker image.