Get similarity within a column based on another column

36 Views Asked by Mitsarien At 28 February 2024 at 11:06

I have a table with three columns: Source, Target, Similarity. The first two are strings, the last one is a float. This table came about by comparing source elements and target elements and finding their similarity. For each source element, there are the 20 most similar target elements. It looks like this (not full table):

Source	Target	Similarity
Source_1	Target_23	0.82
Source_1	Target_12	0.32
Source_1	Target_2	0.02
Source_2	Target_23	0.72
Source_2	Target_14	0.52
Source_2	Target_12	0.12

Based on this information, I would like for each source elemet to calculate its 5 most similar other source elements. The idea is that I don't want to calculate similarity within source elements as it's a computationally expensive process; if source_1 and source_2 are highly similar to the same target element, then they should be similar to each other as well.

What's the best way of doing this?

I've tried ranking in descending order based on similarity the targets for each source and selecting the source elements that have the same top 5 most similar targets irrespective of their order in the top 5. I feel that there is a better way to use the similarity score rather than just for ranking.

I've also tried finding the sources that have the top target of the selected Source element in their top 5 targets, and assuming the similarity is more than a threshold. This worked well, but again I feel I'm not utilising all the information I have by neglecting the remaining targets for each source. (see code snippet for this below)

    selected_source = "Source 1"
    for i in range(len(Source)):
        if similarity_json[selected_source][Target][0] in similarity_json[Source[i]][Target][:5] and similarity_json[Source[i][similarity]>0.5:
            source_similars.append(Source[i])

Original Q&A

Get similarity within a column based on another column

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in SIMILARITY

Trending Questions

Popular # Hahtags

Popular Questions