I want to pass samples from several different species through this command:

bcftools mpileup -Ob -o <study.bcf> -f <ref.fa> <sample1.bam> <sample2.bam> <sample3.bam>

How can I do that with a loop, considering that each species may have 3-5 different individuals? The files look like this, with the species first and individual second:

107_2.bam  107_7.bam    1322_7.bam  1589_3.bam  1777_8.bam  1782_3.bam  2172_5.bam  716_11.bam  716_7.bam   M82_3.bam
107_4.bam  1322_10.bam  1322_9.bam  1589_5.bam  1777_9.bam  1782_5.bam  2172_7.bam  716_5.bam   716_9.bam   M82_4.bam
107_6.bam  1322_1.bam   1589_2.bam  1777_4.bam  1782_2.bam  2172_3.bam  2172_9.bam  716_6.bam   M82_11.bam  M82_8.bam

Thank you for your time.

2

There are 2 best solutions below

2
Ed Morton On

At a guess since I've no idea what a bam file is, nor a species as it relates to your files, nor bcftools, this might be what you're trying to do:

while IFS= read -r species; do
    bcftools mpileup -Ob -o <study.bcf> -f <ref.fa> "${species}_"*.bam
done < <(printf '%s\n' *_*.bam | cut -d'_' -f1 | sort -u)

The above assumes your file names don't contain newlines and that a "species" can't contain underscores, as in your provided example.

0
tecnico On

Assuming that you want to group the files with the same prefix (species) on a single command:

while read species; do echo bcftools mpileup -Ob -o <study.bcf> -f <ref.fa> $(ls ${species}*) ; done < <(ls -1 *bam |sed -e 's/_.*//g'|sort -u)

The "echo" in the above one liner helps you see what the command will do without executing the bcftools cmd. If you are happy with it, then remove the 'echo'

  1. List all the .bam files
  2. Remove the _* suffix from the file name in the list
  3. Sort the prefixes (species) and only print out unique values
  4. For each of those unique values, get a list of all files with that prefix in common and pass it to the bcftools... command as individual parameters.