How to sort and copy numbered files into incremented folders

Question

How to sort and copy numbered files into incremented folders

50 Views Asked by Amanda At 26 December 2023 at 17:16

So I have gene files named 1, 2, ... 19500.fa and want to sort them into folders 200, 400, 600... 19600 for a downstream pipeline. I have an idea of how to do this but it's pretty gruesome:

for file in "${files[@]}"; do

    base_name=$(basename "$file")
    gene_number=$(echo "$base_name" | cut -d'_' -f2 | cut -d'.' -f1)
    to_path= (path to folder containing 200, 400, ... 19600 folders)
    
    #if it's gene_200.fa, 400.fa etc. copy into that dir
    if (( $gene_number%200 == 0)); then 
        cp file $to_path/$gene_number/$file
    elif (( $gene_number < 200 )); then 
        cp file $to_path/200/$file
    elif (( $gene_number > 19400)); then 
        cp file $to_path/19600/$file
    # the endless pain of 200-400, 400-600, 600-800 ... 19200-19400
    elif (( $gene_number > 200 && $gene_number < 400)); then 
        cp file $to_path/19600/$file
    elif ....

My question is then: is there a less tedious way to do this without copying any one file into multiple folders? (e.g. if i only sorted by gene number < file name a file named gene_3.fa would be copied into all folders)

Original Q&A

There are 1 best solutions below

**Ed Morton** · Accepted Answer · 2023-12-26T17:48:49.933000

You could do this, just change the for to loop over the files, change the delta value to 200 and add the cp or mv as you like:

#!/usr/bin/env bash

delta=5
for file in gene_{1..20}.fa; do
    if [[ "$file" =~ [0-9]+ ]]; then
        gene_number="${BASH_REMATCH[0]}"
        bucket=$(( ((gene_number / delta) * delta) + delta ))
        echo "$file -> $bucket"
    fi
done

$ ./tst.sh
gene_1.fa -> 5
gene_2.fa -> 5
gene_3.fa -> 5
gene_4.fa -> 5
gene_5.fa -> 10
gene_6.fa -> 10
gene_7.fa -> 10
gene_8.fa -> 10
gene_9.fa -> 10
gene_10.fa -> 15
gene_11.fa -> 15
gene_12.fa -> 15
gene_13.fa -> 15
gene_14.fa -> 15
gene_15.fa -> 20
gene_16.fa -> 20
gene_17.fa -> 20
gene_18.fa -> 20
gene_19.fa -> 20
gene_20.fa -> 25

The math works because bash does integer arithmetic, not floating point, and so the part after the decimal point after the division will be truncated.

How to sort and copy numbered files into incremented folders

There are 1 best solutions below

Related Questions in BASH

Related Questions in DIRECTORY-STRUCTURE

Trending Questions

Popular # Hahtags

Popular Questions