snakemake: specify file obtained by glob_wildcards

263 Views Asked by At

How can I specify the file obtained by glob_wildcards?

Suppose I have sample1.txt, sample2.txt, sample3.txt, and sample4.txt are in the same directory.

The following code is just an example:

FILES = glob_wildcards("data/{sample}.txt")
SAMPLES = FILES.sample

rule all:
    input:
        expand("{sample}txt", sample=SAMPLES),
        "concat.txt"

rule concat:
    input:
        SAMPLES[0],
        SAMPLES[1]
    output:
        "concat.txt"
    shell:
        "cat {input[0]} {input[1]} > {output}"

When I want to concat sample1.txt and sample2.txt as shown in rule concat, how can I specify those files? Is it correct to write SAMPLES[0] and SAMPLES[1]?

1

There are 1 best solutions below

0
On BEST ANSWER

You are almost correct, except keep in mind that glob_wildcards will return only the wildcard values, so when referencing files in rules you will need to provide these wildcard values into the specific file path.

For consistency, you can continue using expand():

file_pattern = 'data/{sample}.txt'
SAMPLES, = glob_wildcards(file_pattern)

rule all:
    input:
        expand(file_pattern, sample=SAMPLES),
        "concat.txt"

rule concat:
    input:
        expand(file_pattern, sample=SAMPLES[:2]),
    output:
        "concat.txt"
    shell:
        "cat {input} > {output}"