Using semi_join to find similarities but returns none mistakenly

221 Views Asked by At

I am trying to find the similar genes between two columns that I can later work with just the similar genes. Below is my code:

top100_1Beta <- data.frame(grp1_Beta$my_data.SYMBOL[1:100])
top100_2Beta<- data.frame(grp2_Beta$my_data.SYMBOL[1:100])
common100_Beta <- semi_join(top100_1Beta,top100_2Beta)`

When I run the code I get the following error:

Error: by required, because the data sources have no common variables

This is wrong since when I open top100_1Beta and top100_2Beta I can see at least the first few list the exact same genes: ATP2A1, SLMAP, MEOX2,...

I am confused on why then it's returning that no commonalities. Any help would be greatly appreciated. Thanks!

2

There are 2 best solutions below

0
On BEST ANSWER

I don't think you need any form of *_join here; instead it seems you're looking for intersect

intersect(grp1_Beta$my_data.SYMBOL[1:100], grp2_Beta$my_data.SYMBOL[1:100])

This returns a vector of common entries amongst the first 100 entries of grp1_Beta$my_data.SYMBOL and grp1_Beta$my_data.SYMBOL.

0
On

Without a full working example, I'm guessing that your top100_1Beta and top100_2Beta dataframes do not have the same column names. They are probably grp1_Beta.my_data.SYMBOL.1.100. and grp2_Beta.my_data.SYMBOL.1.100.. This means the semi_join function doesn't know where to match the dataframes up. Renaming the columns should fix the issue.