I have created a function which is intended to find relevant sentence with the CO2 equivalent inside a dataframe of sentences which I created earlier as text. I want it to show 1 if the sentence contains it and 0 if it doesn't. So each row in the df corresponds to one sentence. My approach now does not seem to work. It runs but my result are not how they should be. So when I run the code not just the relevant rows get a label but all do. Also I don't know how I can change the function so it just shows a one or a 0 in each row... How do I have to change it so that it does?
def find_co2(text):
# Create a spacy doc
doc = nlp(text)
# Define the pattern
terms = ["Scope 1", "Scope 2", "million metric tons", "CO2"]
# Matcher class object
matcher = PhraseMatcher(nlp.vocab)
patterns = [nlp.make_doc(text) for text in terms]
matcher.add("CO2", patterns)
matches = matcher(doc)
for match_id, start, end in matches:
#rule_id = nlp.vocab.strings[match_id]
span = doc[start:end]
dict2 = {'Co2': span.text} # With this I need help
Co2_list.append(dict2)
return Co2_list
df[sent].apply(find_co2)
df[Co2]=pd.DataFrame(Co2_list)
The data frame is:
Sent ... Co2
0 EMISSIONS MANAGEMENT WATER MANAGEMENT SAFE... []
1 SETUP OPTIMISATION Our operations team reco... []
2 ANNULUS / TOP MANAGEMENT Through our extensi... []
39 6 million metric tons, resulting in a GHG int... []