So I have two list objects in R and I want to know which of the sequences can bind to each other through DNA complementarity.
The first object rs is reverse complement microRNA seed regions and the second is 3'UTRs motifs.
Any lead on how to solve this problem?
I found a package called microRNAs (https://www.bioconductor.org/packages/release/bioc/manuals/microRNA/man/microRNA.pdf) with a function called matchSeeds(seed, seq). I did this but this function is actually looking for exact matches, which is not exactly what I need. Any lead on how to solve this in R will be very much appreciated.
Thanks!
> typeof(rs)
[1] "list"
> typeof(u)
[1] "list"
head(rs)
$`miR-92|34108_3p `
[1] "TGCAAT"
$`miR-92|34106_3p `
[1] "TGCAAT"
$`miR-92|34110_3p `
[1] "TGCAAT"
$`miR-184|1952_3p `
[1] "CCGTCC"
$`miR-184|1954_3p `
[1] "CCGTCC"
$`miR-1795_3p `
[1] "CCGTCC"
head(u)
$upper_1
[1] "gccgtt"
$upper_2
[1] "ccgagc"
$upper_3
[1] "gacatt"
$upper_4
[1] "gcttat"
$upper_5
[1] "taccta"
$upper_6
[1] "tcgtct"
If you want to find if any substrings in
rs
list are complementary to the strings inu
list, and you want it to be performant you can use package Biostrings functionmatchPDict
.Example:
Convert first list to DNAStringSet:
Convert second list to DNAStringSet:
Get the complement of lis
create a PDict so you can match it fast vs the other list
iterate over list
u
runningmatchPDict
EDIT: if you want to check any orientation you can create them using accesory functions such as
complement
,reverse
andreverseComplement
and provide that toPDict
:res
is a list of IRanges objectsyou can check where the hits are with
EDIT2:
if you just want to count the matches without the match coordinates you can simply use:
rows correspond to sequences in
pdict0
, while columns correspond to sequences inu
: