Find ranges that are shared by 80% or more of 10 GRanges objects

45 Views Asked by At

Introduction and problem

I have multiple (>2) GRanges objects. I want to find those ranges that are shared by x% or more of all GRanges.

Example data

I will provide some example data as dataframes, let's say we want to find those ranges that are shared by 66.7% (2/3) or more.

gr1 <- data.frame(seqnames = rep('chr1', 3),
                  start = c(1, 10, 20), 
                  end = c(3, 17, 30))


gr2 <- data.frame(seqnames = rep('chr1', 3),
                  start = c(2, 11, 31), 
                  end = c(3, 19, 35))

gr3 <- data.frame(seqnames = rep('chr1', 3),
                  start = c(2, 16, 37), 
                  end = c(3, 22, 40)

Shown are the dataframes:

Shown are the dataframes

Output wanted

A Granges output. In the example the algorithm should find:

chr1 2 - 3 Reason: (2-3 is found in gr1, gr2 and gr3, 1 only found in gr1) chr1 11 - 22 Reason: (11-17 is found in gr1 and gr2, 10 only in gr1 ,18-19 in gr2 and gr3, 20 -22 in gr1 and gr3)

What I have done

I know how to find query hits found in all (100%) GRanges, see R overlap multiple GRanges with findOverlaps()

1

There are 1 best solutions below

1
dshandel On

I asked this same question in the Bioconductor website and it was answered correctly there: https://support.bioconductor.org/p/9148540/