I have a GRanges object with some genomic intervals and some metadata (3 vectors with the coverage of each region in 3 different samples). I have applied:
disjoin(my_data)
to obtain a new GRanges object with the smallest set of unique, non-overlapping pieces.
The problem is that I cannot conserve metadata in my new GRanges object. What I would like to obtain is the mean coverage of genomic regions which included this unique set.
As an example, I would like to turn this metadata:
sample1 sample2 sample3
1:1-3 30 NA NA
1:1-4 NA 40 35
1:4-5 35 NA NA
1:5-7 NA 50 50
1:6-7 60 NA NA
into this:
sample1 sample2 sample3
1:1 30 40 35
1:2 30 40 35
1:3 30 40 35
1:4 35 40 35
1:5 35 50 50
1:6 60 50 50
1:7 60 50 50
How can I achieve that?
Here is a data.table approach to conserving metadata for the disjoined set of ranges.
First, find the overlaps between the disjoined set of ranges and the original data. Then collect the coverage for the overlaps into a
data.table. Find the unique coverage for that range by sample, removingNAvalues. Note that.SDis a special symbol for the subsetted data.table for the group. Finally, join the result back onto the disjoined data.Data