How to intersect and merge files with python/pandas to divide overlap into subregions based on original input file?

641 Views Asked by jordimaggi At 17 August 2025 at 12:31

I have a few .bed files from different companies offering exome sequencing kits.

I would like to have a file that summarizes all target regions for all these kits. The .bed file have a basic structure composed for three columns (chr#, Start, End).

I would like to get an output table that shows which genomic regions are covered only by one of these kits, and which regions are covered by more than one (and which ones). The best way to illustrate this is by an example:

BED file 1

chr#	Start	End
1	100	300

BED file 2

chr#	Start	End
1	150	350

BED file 3

chr#	Start	End
1	80	200

From these files, I created a dataframe containing all target regions, and ordered it by chr# and Start coordinates. This is what the resulting dataframe looks like:

I would like to merge and intersect the files for an output that divides the regions into subregions based on overlap between the input files. It should looks something like this:

chr#	Start	End	Kit 1	Kit 2	Kit 3
1	80	100	0	0	1
1	100	150	1	0	1
1	150	200	1	1	1
1	200	300	1	1	0
1	300	350	0	1	0

I know there may be such a function on GRanges from Bioconductor, but I am not familiar with the library and its functions.

Any help would be appreciated.

Original Q&A

There are 1 best solutions below

AudioBubble On 20 March 2021 at 18:13

UPDATE

Using multiIntersectBed from bedtools

$ multiIntersectBed -i *.bed
1   80  100 1   3       0   0   1
1   100 150 2   1,3     1   0   1
1   150 200 3   1,2,3   1   1   1
1   200 300 2   1,2     1   1   0
1   300 350 1   2       0   1   0

There's a Python interface too:

http://daler.github.io/pybedtools/

>>> import pybedtools as bt
>>> from   glob       import glob
>>> print(bt.BedTool().multi_intersect(i=glob('*.bed')))
1   80  100 1   3       0   0   1
1   100 150 2   1,3     1   0   1
1   150 200 3   1,2,3   1   1   1
1   200 300 2   1,2     1   1   0
1   300 350 1   2       0   1   0

How to intersect and merge files with python/pandas to divide overlap into subregions based on original input file?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in BIOINFORMATICS

Related Questions in BIOCONDUCTOR

Related Questions in BED

Trending Questions

Popular # Hahtags

Popular Questions