"while read -r line; do" not recognizing counter variable

46 Views Asked by At

I have code that works for processing every line of files, but when I attempt to limit the iterations by an integer "n" with a counter variable in the while loop, it no longer works.

Here is my code that works:


# Output file for biallelic SNPs
output_file=${SNAPTMP}/SNAP.proxy.ld.gwas.bestproxy.out

# Loop over the files
for file in ${SNAPTMP}/SNAP.*.proxy.ld.gwas.bestproxy; do
  echo "Processing file: $file"
  counter=0

  # Process each line of the file

  while read -r line; do

  # Extract the 6th field from the line

    field=$(echo "$line" | awk '{print $6}')

    # Check if the field is a biallelic SNP

    if [[ $(is_biallelic "$field") -eq 1 ]]; then

    # Append the line to the output file


    echo "$line" >> "$output_file"

    fi

  done < "$file"

done

This works correctly and as expected and makes this output:

[-----.-------@---- SNAPPY]$ cat proxy/SNAPTMP/SNAP.proxy.ld.gwas.bestproxy.out
6 30656398 rs2249059 6 30609835 rs78802957 1 46563
6 30656398 rs2249059 6 30609835 rs78802957 1 46563
6 30656398 rs2249059 6 30607289 rs142580331 1 49109
6 30656398 rs2249059 6 30607189 rs113520162 1 49209
6 30656398 rs2249059 6 30607173 rs111808357 1 49225
6 30656398 rs2249059 6 30606141 rs112927484 1 50257
6 30656398 rs2249059 6 30604733 rs147842052 1 51665
...

(There at 49 lines in this file)

My issue is that I want this to only print "n" lines per file that posses biallelic SNPs on field 6 to my output file. I modified the code to this:

n=4

snp_db_file=/project/richards/ethan.kreuzer/snp156.db

# Output file for biallelic SNPs
output_file=${SNAPTMP}/SNAP.proxy.ld.gwas.bestproxy.out

# Loop over the files
for file in ${SNAPTMP}/SNAP.*.proxy.ld.gwas.bestproxy; do
  echo "Processing file: $file"
  counter=0

  # Process each line of the file
  while read -r line; do
    # Extract the 6th field from the line
    field=$(echo "$line" | awk '{print $6}')

    # Check if the field is a biallelic SNP
    if [[ $(is_biallelic "$field") -eq 1 ]]; then
      # Append the line to the output file
      echo "$line" >> "$output_file"
      ((counter++))
      if ((counter >= n)); then
        break  # Break the inner loop after n iterations
      fi
    fi

  done < "$file"

done

But now I get :

[-----.-------@---- SNAPPY]$ cat proxy/SNAPTMP/SNAP.proxy.ld.gwas.bestproxy.out
6 30656398 rs2249059 6 30609835 rs78802957 1 46563

This seems like basic code so I really am not sure what I am doing wrong.

1

There are 1 best solutions below

3
chepner On

You don't need a counter. Have your while loop output all the lines, and use head to output only the first n of them to your output file. When head exits, the loop will as well the first time it tries to write a line to the now-closed pipe.

for file in ${SNAPTMP}/SNAP.*.proxy.ld.gwas.bestproxy; do
  echo "Processing file: $file"
  # Process each line of the file
  while read -r line; do
    field=$(echo "$line" | awk '{print $6}')    
    [[ $(is_biallelic "$field") -eq 1 ]] &&  echo "$line"
  done < "$file" | head -n "$n" >> "$output_file"
done

Check if you can use the exit status of is_biallelic, rather than its output, to determine whether to output $line, so you can write something like

is_biallelic "$field" && echo "$line"

It's also likely that the entire while loop can be replaced with a single awk script that can invoke is_biallelic as needed, rather than running awk on every line just to extract one field. It could be as simple as

awk 'system("is_biallelic $6")' "$file" >> "$output_file"