I am trying to run a GenotypeGVCFs on many vcf files. The command line wants every single vcf files be listed as:
java-jar GenomeAnalysisTK.jar -T GenotypeGVCFs \
-R my.fasta \
-V bob.vcf \
-V smith.vcf \
-V kelly.vcf \
-o {output.out}
How to do this in snakemake? This is my code, but I do not know how to create a wildcards for -V.
workdir: "/path/to/workdir/"
SAMPLES=["bob","smith","kelly]
print (SAMPLES)
rule all:
input:
"all_genotyped.vcf"
rule genotype_GVCFs:
input:
lambda w: "-V" + expand("{sample}.vcf", sample=SAMPLES)
params:
ref="my.fasta"
output:
out="all_genotyped.vcf"
shell:
"""
java-jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R {params.ref} {input} -o {output.out}
"""
You are putting the cart before the horse. Wildcards are needed for rule generalization: you can define a pattern for a rule where wildcards are used to define generic parts. In your example there are no patterns: everything is defined by the value of
SAMPLES. This is not a recommended way to use Snakemake; the pipeline should be defined by the filesystem: which files are present on your disk.By the way, your code will not work, as the
inputshall define the list of filenames, while in your example you are (incorrectly) trying to define the strings like"-V filename".So, you have the output:
"all_genotyped.vcf". You have the input:["bob.vcf", "smith.vcf", "kelly.vcf"]. You don't even need to use a lambda here, as the input doesn't depend on any wildcard. So, you have:Actually you don't even need
inputsection. If you know for sure that the files fromSAMPLESlist exist, you may skip it.The values for
-Vcan be defined in params:This should solve your issue, but I would advise you to rethink your solution. The use of
SAMPLElist smells. Alternatively: do you really need Snakemake if you have all dependencies defined already?