I have been trying to convert my fastq files on google cloud to uBAM files but no success so far. Here is code I used:
dsub \
--project projectID \
--zones "us-central1-*" \
--logging gs://bucket/logging \
--image broadinstitute/picard \
--command 'java -Xmx8G -jar picard.jar FastqToSam FASTQ=gs://bucket/S_1.fq FASTQ2=gs://bucket/S_2.fq OUTPUT=gs://bucket/S_fastqtosam.bam READ_GROUP_NAME=Cancers SAMPLE_NAME=TS LIBRARY_NAME=Solexa-272222 PLATFORM_UNIT=CL100056 PLATFORM=illumina SEQUENCING_CENTER=BI RUN_DATE=2017-08-20T00:00:00-0400'
--wait
I can see the image has been pulled and run successful, but then I got the error message saying the command is incorrect, and please check PicardcommandLine -h
Does anyone have experience in converting fastq to uBAM with google cloud? PLease help. Much appreciated. Thank you.
The
broadinstitute/picard
docker image already runs java as its Entrypoint (see [1] for more details). You need to change the command to removejava -Xmx8G -jar picard.jar
.I'm also not sure whether passing cloud storage paths directly to picard would work and you likely need to specify additional arguments (
--input
and--output
) when running dsub. See https://github.com/DataBiosphere/dsub/blob/master/docs/input_output.md for more details.[1] The Entrypoint is defined as
"/usr/picard/docker_helper.sh"
, which is this file that runs java directly. Each docker image has its own Entrypoint, so it's important to ensure you are using the right commands for each docker image.