I have code that works for processing every line of files, but when I attempt to limit the iterations by an integer "n" with a counter variable in the while loop, it no longer works.
Here is my code that works:
# Output file for biallelic SNPs
output_file=${SNAPTMP}/SNAP.proxy.ld.gwas.bestproxy.out
# Loop over the files
for file in ${SNAPTMP}/SNAP.*.proxy.ld.gwas.bestproxy; do
echo "Processing file: $file"
counter=0
# Process each line of the file
while read -r line; do
# Extract the 6th field from the line
field=$(echo "$line" | awk '{print $6}')
# Check if the field is a biallelic SNP
if [[ $(is_biallelic "$field") -eq 1 ]]; then
# Append the line to the output file
echo "$line" >> "$output_file"
fi
done < "$file"
done
This works correctly and as expected and makes this output:
[-----.-------@---- SNAPPY]$ cat proxy/SNAPTMP/SNAP.proxy.ld.gwas.bestproxy.out
6 30656398 rs2249059 6 30609835 rs78802957 1 46563
6 30656398 rs2249059 6 30609835 rs78802957 1 46563
6 30656398 rs2249059 6 30607289 rs142580331 1 49109
6 30656398 rs2249059 6 30607189 rs113520162 1 49209
6 30656398 rs2249059 6 30607173 rs111808357 1 49225
6 30656398 rs2249059 6 30606141 rs112927484 1 50257
6 30656398 rs2249059 6 30604733 rs147842052 1 51665
...
(There at 49 lines in this file)
My issue is that I want this to only print "n" lines per file that posses biallelic SNPs on field 6 to my output file. I modified the code to this:
n=4
snp_db_file=/project/richards/ethan.kreuzer/snp156.db
# Output file for biallelic SNPs
output_file=${SNAPTMP}/SNAP.proxy.ld.gwas.bestproxy.out
# Loop over the files
for file in ${SNAPTMP}/SNAP.*.proxy.ld.gwas.bestproxy; do
echo "Processing file: $file"
counter=0
# Process each line of the file
while read -r line; do
# Extract the 6th field from the line
field=$(echo "$line" | awk '{print $6}')
# Check if the field is a biallelic SNP
if [[ $(is_biallelic "$field") -eq 1 ]]; then
# Append the line to the output file
echo "$line" >> "$output_file"
((counter++))
if ((counter >= n)); then
break # Break the inner loop after n iterations
fi
fi
done < "$file"
done
But now I get :
[-----.-------@---- SNAPPY]$ cat proxy/SNAPTMP/SNAP.proxy.ld.gwas.bestproxy.out
6 30656398 rs2249059 6 30609835 rs78802957 1 46563
This seems like basic code so I really am not sure what I am doing wrong.
You don't need a counter. Have your
whileloop output all the lines, and useheadto output only the firstnof them to your output file. Whenheadexits, the loop will as well the first time it tries to write a line to the now-closed pipe.Check if you can use the exit status of
is_biallelic, rather than its output, to determine whether to output$line, so you can write something likeIt's also likely that the entire
whileloop can be replaced with a singleawkscript that can invokeis_biallelicas needed, rather than runningawkon every line just to extract one field. It could be as simple as