Why is the database not detected when I run BLAST on Nextflow?

87 Views Asked by At

I keep getting the error that the database isn't detected when I run BLAST on Nextflow.

I used the following code and I cannot run the second process (extractTopHits) because I keep getting an error that says "No such variable: db"

#!/usr/bin/env nextflow
nextflow.enable.dsl=2
params.query = "/home/galaxy/Vivian/16s.fasta"
params.db = "/home/galaxy/Vivian/blastdb"

process blastSearch {
  input:
    path query
    path db
  output:
    path "top_hits.txt"

    """
    /home/galaxy/Vivian/ncbi-blast-2.13.0+/bin/blastn -db $db/16S_ribosomal_RNA -query $query -outfmt 6>    cat blast_result | head -n 10 | cut -f 2 > top_hits.txt
    """
}

process extractTopHits {
  input:
    path "top_hits.txt"
  output:
    path "sequences.txt"

    """
    /home/galaxy/Vivian/ncbi-blast-2.13.0+/bin/blastdbcmd -db $db -entry_batch  top_hits.txt > sequence>    """
}

workflow {
    def query_ch = Channel.fromPath(params.query)
    blastSearch(query_ch, params.db) | extractTopHits | view
}
1

There are 1 best solutions below

2
Steve On

The example in the docs unfortunately is not correct and will not work as intended. The problem is that the extractTopHits input declaration does not specify a database directory so that it can be staged into the process working directory. You might prefer instead the following approach:

params.query = "/home/galaxy/Vivian/*.fasta"
params.db = "/home/galaxy/Vivian/blastdb/16S_ribosomal_RNA"

db_name = file(params.db).name
db_path = file(params.db).parent
process blastSearch {
  
    conda 'blast=2.14.0'

    input:
    path query
    path db
  
    output:
    path "blast_result.txt", emit: blast_result
    path "top_hits.txt", emit: top_hits

    """
    blastn \\
        -db "${db}/${db_name}" \\
        -query "${query}" \\
        -outfmt 6 \\
        > blast_result.txt

    cat blast_result.txt \\
        | head -n 10 \\
        | cut -f 2 \\
        > top_hits.txt
    """
}
process extractTopHits {

    conda 'blast=2.14.0'
    
    input:
    path top_hits
    path db
    
    output:
    path "sequences.txt"
    
    """
    blastdbcmd \\
        -db "${db}/${db_name}" \\
        -entry_batch "${top_hits}" \\
        > sequences.txt
    """
}

workflow {
    
    query_ch = Channel.fromPath( params.query )

    blastSearch( query_ch, db_path ) 
    extractTopHits( blastSearch.out.top_hits, db_path )

    extractTopHits.out.view()
}

Note that the above uses the conda directive so that we can avoid specifying absolute paths to the executables. Enabling Conda is as easy as adding the following to your nextflow.config:

conda {

    enabled = true
}