Fixing segmentation fault error in bcftools

2.2k Views Asked by At

I am trying to merge 3000 bacterial bcf files using bcftools. The vcf files has been generated using GATK and converted to bcf and indexed by bcftools. The bcftools proceeds to analyze 20% of the data but it keeps terminating premature and produces a merged bcf files only for a portion of variants ( up to 500kb from 2M bacterial genome). The code I am using is like this:

bcftools1.7/bcftools merge -l VarList.txt -0 --missing-to-ref --threads 1 -O b > CombinedVCF

The output error is :

/bin/sh: line 1: 17041 Segmentation fault (core dumped) bcftools/bcftools merge -l VarList.txt -0 --missing-to-ref --threads 1 -O b > CombinedVCF

Previously I tried the same command for 400 samples without any problem.

Searching online, "A segfault occurs when a reference to a variable falls outside the segment where that variable resides, or when a write is attempted to a location that is in a read-only segment". The command is running on a cluster with 80Gb of available RAM for the specific job. I am not sure whether this error is due to a problem with the bcftools software itself or because of the limitation of system which is running the command?

Here is the sample bcf files to replicate the error (https://figshare.com/articles/BCF_file_segfault/7412864). The error appears only for large sample sizes so I could not reduce the size any further.

3

There are 3 best solutions below

0
On BEST ANSWER

It was a bug in bcftools and the author kindly fixed it after notification:

https://github.com/samtools/bcftools/issues/929#issuecomment-443614761
0
On

I am not sure whether this error is due to a problem with the bcftools software itself or because of the limitation of system which is running the command?

When a program crashes, it's always a bug in the program itself -- if it runs into limitation of the system, it should tell you so (e.g. unable to allocate NNN bytes) instead of crashing.

Your first step should be find out where it crashes:

gdb -ex run --args bcftools1.7/bcftools merge -l VarList.txt -0 --missing-to-ref --threads 1 -O b

GDB should stop with Program received SIGSEGV. At this point, type where, info registers, info locals, and x/20i $pc-40 at the (gdb) prompt, and update your question with the output.

This output will likely enable someone to determine which bug you are running into, what workarounds might be possible, etc. etc.

It's also the info that developers of bfctools would need if you were to report the issue to them.

0
On

I also had problems when merging thousands of vcf files with bcftools. In my case, the problem was the number of open files. You may need to increase the open file limit. Try these commands:

# check soft limit
ulimit -Sn

#check hard limit
ulimit -Hn

#set soft limit
ulimit -Sn <number>