append overlapping genomic intervals

15 Views Asked by At

I have two files that contain genomic intervals, one is the master index and the other contains a subset of genomic intervals, some of which overlap with the master index. Oftentimes, more than one genomic interval will overlap with the intervals in the master index. I know how to do bedtools intersect and that sort of thing, but I don't want the row number in the master index to increase, rather I'd like to append both overlapping intervals to the same line.

So, for example, here is a snippet of the master index file:

chr2L   10239   10488
chr2L   10906   11238
chr2L   11389   11538
chr2L   11790   12138
chr2L   14489   14688
chr2L   18139   18438
chr2L   20939   21338
chr2L   25402   25801
chr2L   26052   26201

And here would be one of the second files:

chr2L  18002   18367   .034   18    0
chr2L  18401   18600   .02    20    2
chr2L  26000   26100   .01    10    0

And this would be the desired output:

chr2L   10239   10488
chr2L   10906   11238
chr2L   11389   11538
chr2L   11790   12138
chr2L   14489   14688
chr2L   18139   18438   chr2L  18002   18367   .034   18    0,chr2L  18401   18600   .02    20    2
chr2L   20939   21338
chr2L   25402   25801
chr2L   26052   26201   chr2L  26000   26100   .01    10    0
  

Changing the delimiters in the second file is fine if thats necessary, for example all the columns after the chromosome interval could be comma separated if thats necessary. I don't have an example code of what I have tried, because nothing is getting even close to working. My guess would be that awk can do this in some way but if anyone has any insight its most appreciated.

0

There are 0 best solutions below