How to calculate relevance score?

587 Views Asked by At

I am trying to calculate relevance score using a review from a json file. Every time I tried to run my code, it will only say "indirect" for output. What am I doing wrong?

My code is below:

import joblib, requests, json, sklearn.metrics, sklearn.model_selection, sklearn.tree, time, math, textblob

import warnings
warnings.filterwarnings("ignore")

response = requests.get("https://appliance_reviews.json")

if response:
    data = json.loads(response.text)
    
    unique = []
    word = []
    for line in data:
        #print(line)
        
        review = line["Review"]
        blob = textblob.TextBlob(review)
        
        for word in blob.words:
            
            if word.lower() not in unique:
                unique.append(word.lower())
   
    for word in unique:
        a = 0
        b = 0
        c = 0
        d = 0
       
        for line in data:
           
            review = line["Review"]
            safety = line["Safety hazard"]
           
            if word in review.lower() and safety == 1:
                a += 1
            if word in review.lower() and safety == 0:
                b += 1
            if word in review.lower() and safety == 1:
                c += 1
            if word in review.lower() and safety == 0:
                d += 1
        
        try:
            rel_score = (math.sqrt(a + b + c + d) * ((a + d) - (c * b))) / math.sqrt((a + b) * (c + d))
        except:

            rel_score = 0
            
        if rel_score >= 4000:
            score.append(word)
    print(word)

1

There are 1 best solutions below

0
Joffan On

word would just be the last entry in unique at the time you print it on the last line of code given, regardless of its scoring. You've just exited a for loop where word was the iterating variable.

Are you sure that you didn't want to print score, which seems to be intended to accumulate high-scoring words from unique?

Also I think your scoring is broken. For example as coded, a and c are always equal, as are b and d. "carpet" would affect the score of both "car", "pet" and indeed "carp".

As Prune mentions in comments, your bland choice of variable names makes understanding the purpose of the code difficult.