I am trying to run a GenotypeGVCFs
on many vcf files. The command line wants every single vcf
files be listed as:
java-jar GenomeAnalysisTK.jar -T GenotypeGVCFs \
-R my.fasta \
-V bob.vcf \
-V smith.vcf \
-V kelly.vcf \
-o {output.out}
How to do this in snakemake? This is my code, but I do not know how to create a wildcards for -V.
workdir: "/path/to/workdir/"
SAMPLES=["bob","smith","kelly]
print (SAMPLES)
rule all:
input:
"all_genotyped.vcf"
rule genotype_GVCFs:
input:
lambda w: "-V" + expand("{sample}.vcf", sample=SAMPLES)
params:
ref="my.fasta"
output:
out="all_genotyped.vcf"
shell:
"""
java-jar GenomeAnalysisTK.jar -T GenotypeGVCFs -R {params.ref} {input} -o {output.out}
"""
You are putting the cart before the horse. Wildcards are needed for rule generalization: you can define a pattern for a rule where wildcards are used to define generic parts. In your example there are no patterns: everything is defined by the value of
SAMPLES
. This is not a recommended way to use Snakemake; the pipeline should be defined by the filesystem: which files are present on your disk.By the way, your code will not work, as the
input
shall define the list of filenames, while in your example you are (incorrectly) trying to define the strings like"-V filename"
.So, you have the output:
"all_genotyped.vcf"
. You have the input:["bob.vcf", "smith.vcf", "kelly.vcf"]
. You don't even need to use a lambda here, as the input doesn't depend on any wildcard. So, you have:Actually you don't even need
input
section. If you know for sure that the files fromSAMPLES
list exist, you may skip it.The values for
-V
can be defined in params:This should solve your issue, but I would advise you to rethink your solution. The use of
SAMPLE
list smells. Alternatively: do you really need Snakemake if you have all dependencies defined already?