How can I have a True or False column in a data frame if it contains a certain sentence using spacy?

68 Views Asked by At

I have created a function which is intended to find relevant sentence with the CO2 equivalent inside a dataframe of sentences which I created earlier as text. I want it to show 1 if the sentence contains it and 0 if it doesn't. So each row in the df corresponds to one sentence. My approach now does not seem to work. It runs but my result are not how they should be. So when I run the code not just the relevant rows get a label but all do. Also I don't know how I can change the function so it just shows a one or a 0 in each row... How do I have to change it so that it does?

 def find_co2(text):                                                      
  # Create a spacy doc                                                 
  doc = nlp(text)                                                      
                                                                     
  # Define the pattern                                                 
                                                                     
  terms = ["Scope 1", "Scope 2", "million metric tons", "CO2"]         
  # Matcher class object                                               
  matcher = PhraseMatcher(nlp.vocab)                                   
  patterns = [nlp.make_doc(text) for text in terms]                    
  matcher.add("CO2", patterns)                                         
  matches = matcher(doc)                                               
                                                                      
  for match_id, start, end in matches:                                 
    #rule_id = nlp.vocab.strings[match_id]                            
    span = doc[start:end]                                             
    dict2 = {'Co2': span.text} # With this I need help                                       
    Co2_list.append(dict2)                                            
 return Co2_list
df[sent].apply(find_co2)
df[Co2]=pd.DataFrame(Co2_list)                                                      


The data frame is:
                                             Sent  ... Co2
0     EMISSIONS MANAGEMENT  WATER MANAGEMENT  SAFE...   []
1     SETUP OPTIMISATION Our operations team  reco...   []
2     ANNULUS / TOP MANAGEMENT Through our extensi...   []
39  6 million metric tons, resulting  in a GHG int...   []
0

There are 0 best solutions below