Snakemake on cluster: OutputException and submit one job for each wildcard item

106 Views Asked by At

I try to use snakemake on LSF with LSF profile, but only one job is submitted when using a wildcard.

Submitted job 1 with external jobid '660343 logs/cluster/try_expand/unique/jobid1_4530cab3-d29c-485d-8d46-871fb7042e50.out'.

Below is a minimal example run with

snakemake --profile lsf -s try.smk 2> `date +"%Y%m%d_%H%M"`_snakemake_try.log --latency-wait 20
CHROMOSOMES = [ 20, 21, 22]

rule targets:
    input: 
         expand("try/chr{chromosome}.GATK_calls.indels.PASS.common_var_2.bcf", chromosome=CHROMOSOMES)
    log:
        "try_logs/targets.log"

rule try_expand:
    threads: 6
    output:
        expand("try/chr{chromosome}.GATK_calls.indels.PASS.common_var_2.bcf", chromosome=CHROMOSOMES) 
    shell:"""
        touch {output}
    """

The log file of the above command is here. I suspect this has been the reason for OutputException when running larger tasks that takes a long time to complete the first wildcard.

Waiting at most 20 seconds for missing files.
MissingOutputException in line 22 of extraction.smk:
Missing files after 20 seconds:
chr21.GATK_calls.indels.PASS.common_var.bcf
chr22.GATK_calls.indels.PASS.common_var.bcf

How can I avoid the OutputException and submit each wildcard item as a job? Thanks!

1

There are 1 best solutions below

0
On BEST ANSWER

You're confusing a wildcard and a variable of the expand function. Your rule try_expand has the three files defined in output, hence it will only be run once to produce all your targets. In the output, {chromosome} is not a wildcard but a placeholder for the second argument of the expand function.

What you probably want is:

CHROMOSOMES = [ 20, 21, 22]

rule targets:
    input: 
         expand("try/chr{chromosome}.GATK_calls.indels.PASS.common_var_2.bcf", chromosome=CHROMOSOMES)
    log:
        "try_logs/targets.log"

rule try_expand:
    threads: 6
    output:
        "try/chr{chromosome}.GATK_calls.indels.PASS.common_var_2.bcf" 
    shell:
    """
        touch {output}
    """

Note that if you need to use a wildcard in an expand function, you have to double the {}.
example:

output: expand("{path}/chr{{chromosome}}.GATK_calls.indels.PASS.common_var_2.bcf", path="/my/path")

Here, {path} is a place holder defined in the second argument of the expand function, {{chromosome}} is a wildcard.