multiple variables in bash loop for bwa sampe

126 Views Asked by At

I'm trying to process multiple input files with matching prefixes and different file types in a bwa program (bwa sampe) here's the general structure:

bwa sampe /Users/xxx/Desktop/Index_align/GRCh37_latest_genomic.fna H2_S16_L001_read1.sai H2_S16_L001_read2.sai \
H2_S16_L001_R1_001.fastq.gz H2_S16_L001_R2_001.fastq.gz > aln_H2_S16_L001.sam

I have all of the .sai and fastq.gz files in the current directory, and I'm trying to make a loop like:

for i in /Users/xxx/Desktop/Index_align/Fastq/fastq_run4/; do
    bwa sampe /Users/xxx/Desktop/Index_align/GRCh37_latest_genomic.fna \
    $i\-read1.sai $i\-read2.sai $i\-R1_001.fastq.gz $i\-R2_001.fastq.gz > $i\-aln.sam;
done

Does anyone have suggestions for what I am missing? Like perhaps I need to create a list of the prefix file names? I would greatly appreciate any advice. Thanks!

ETA: I've tried making a read list of each prefix file and running:

for i in $(cat read1_list | sed s'/\-R1_001.fastq.gz//'); do 
    bwa sampe /Users/katherinenoble/Desktop/Index_align/GRCh37_latest_genomic.fna \
    $i\-read1.sai $i\-read2.sai $i\-R1_001.fastq.gz $i\-R2_001.fastq.gz | samtools view -bS - >  $i\.bam;
done

But this essentially just makes files of the full file title prefix.

1

There are 1 best solutions below

0
On

You can create a for loop that generates the needed prefix. If the files range from H0_S00_L000 to H2_S16_L003, you can use the loop below. Make note that you will have to check if every file really exist.

for prefix in H{0..9}_S{00..16}_L{000..003}; do
    echo $prefix;
done

If you want to use existing files that exist in a set of 4, you could use the following loop. It finds every file in the current directory. A prefix is defined to be everything until _fast or _read. It then sorts the set and removes doubles and continues to remove prefixes if not exactly 4 of them are found.

while read -r prefix; do 
    echo $prefix;
done < <(find \
    | sed -r 's/(.*)_(fast|read).*/\1/' \
    | sort | uniq -c \
    | sed -r 's/[ ]*4 (.*)$/\1/; /^ /d')