How to sort and copy numbered files into incremented folders

50 Views Asked by At

So I have gene files named 1, 2, ... 19500.fa and want to sort them into folders 200, 400, 600... 19600 for a downstream pipeline. I have an idea of how to do this but it's pretty gruesome:

for file in "${files[@]}"; do

    base_name=$(basename "$file")
    gene_number=$(echo "$base_name" | cut -d'_' -f2 | cut -d'.' -f1)
    to_path= (path to folder containing 200, 400, ... 19600 folders)
    
    #if it's gene_200.fa, 400.fa etc. copy into that dir
    if (( $gene_number%200 == 0)); then 
        cp file $to_path/$gene_number/$file
    elif (( $gene_number < 200 )); then 
        cp file $to_path/200/$file
    elif (( $gene_number > 19400)); then 
        cp file $to_path/19600/$file
    # the endless pain of 200-400, 400-600, 600-800 ... 19200-19400
    elif (( $gene_number > 200 && $gene_number < 400)); then 
        cp file $to_path/19600/$file
    elif ....

My question is then: is there a less tedious way to do this without copying any one file into multiple folders? (e.g. if i only sorted by gene number < file name a file named gene_3.fa would be copied into all folders)

1

There are 1 best solutions below

0
Ed Morton On BEST ANSWER

You could do this, just change the for to loop over the files, change the delta value to 200 and add the cp or mv as you like:

#!/usr/bin/env bash

delta=5
for file in gene_{1..20}.fa; do
    if [[ "$file" =~ [0-9]+ ]]; then
        gene_number="${BASH_REMATCH[0]}"
        bucket=$(( ((gene_number / delta) * delta) + delta ))
        echo "$file -> $bucket"
    fi
done

$ ./tst.sh
gene_1.fa -> 5
gene_2.fa -> 5
gene_3.fa -> 5
gene_4.fa -> 5
gene_5.fa -> 10
gene_6.fa -> 10
gene_7.fa -> 10
gene_8.fa -> 10
gene_9.fa -> 10
gene_10.fa -> 15
gene_11.fa -> 15
gene_12.fa -> 15
gene_13.fa -> 15
gene_14.fa -> 15
gene_15.fa -> 20
gene_16.fa -> 20
gene_17.fa -> 20
gene_18.fa -> 20
gene_19.fa -> 20
gene_20.fa -> 25

The math works because bash does integer arithmetic, not floating point, and so the part after the decimal point after the division will be truncated.