Semantic Similarity between Sentences in a Text

1.9k Views Asked by Sigmund Reed At 11 January 2017 at 15:57

I have used material from here and a previous forum page to write some code for a program that will automatically calculate the semantic similarity between consecutive sentences across a whole text. Here it is;

The code for the first part is copy pasted from the first link, then I have this stuff below which I put in after the 245 line. I removed all excess after line 245.

with open ("File_Name", "r") as sentence_file:
    while x and y:
        x = sentence_file.readline()
        y = sentence_file.readline()
        similarity(x, y, true)           
#boolean set to false or true 
        x = y
        y = sentence_file.readline()

My text file is formatted like this;

Red alcoholic drink. Fresh orange juice. An English dictionary. The Yellow Wallpaper.

In the end I want to display all the pairs of consecutive sentences with the similarity next to it, like this;

["Red alcoholic drink.", "Fresh orange juice.", 0.611],

["Fresh orange juice.", "An English dictionary.", 0.0]

["An English dictionary.", "The Yellow Wallpaper.",  0.5]

if norm(vec_1) > 0 and if norm(vec_2) > 0:
    return np.dot(vec_1, vec_2.T) / (np.linalg.norm(vec_1)* np.linalg.norm(vec_2))
 elif norm(vec_1) < 0 and if norm(vec_2) < 0:
    ???Move On???

Original Q&A

There are 1 best solutions below

Avantol13 On 11 January 2017 at 16:18 BEST ANSWER

This should work. There's a few things to note in the comments. Basically, you can loop through the lines in the file and store the results as you go. One way to process two lines at a time is to set up an "infinite loop" and check the last line we've read to see if we've hit the end (readline() will return None at the end of a file).

# You'll probably need the file extention (.txt or whatever) in open as well
with open ("File_Name.txt", "r") as sentence_file:
    # Initialize a list to hold the results
    results = []

    # Loop until we hit the end of the file
    while True:
        # Read two lines
        x = sentence_file.readline()
        y = sentence_file.readline()

        # Check if we've reached the end of the file, if so, we're done
        if not y:
            # Break out of the infinite loop
            break
        else:
            # The .rstrip('\n') removes the newline character from each line
            x = x.rstrip('\n')
            y = y.rstrip('\n')

            try: 
                # Calculate your similarity value
                similarity_value = similarity(x, y, True)

                # Add the two lines and similarity value to the results list
                results.append([x, y, similarity_value])
            except:
                print("Error when parsing lines:\n{}\n{}\n".format(x, y))

# Loop through the pairs in the results list and print them
for pair in results:
    print(pair)

Edit: In regards to issues you're getting from similarity(), if you want to simply ignore the line pairs that are causing these errors (without looking at the source in depth I really have no idea what's going on), you can add a try, catch around the call to similarity().

Semantic Similarity between Sentences in a Text

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in VECTOR

Related Questions in TF-IDF

Related Questions in SENTENCE-SIMILARITY

Related Questions in LATENT-SEMANTIC-ANALYSIS

Trending Questions

Popular # Hahtags

Popular Questions