I am trying to extract causal arguments at the sentence level. so far, my code works but somehow returns the wrong arguments.
Such that: SRL demo for sentence 'Our results may be materially adversely affected by the outcomes of litigation, legal proceedings and other legal or regulatory matters.'
the causing argument is " the outcomes of litigation, legal proceedings and other legal or regulatory matters " and this corresponds to A1 (aka Arg1).
#requirements:
from allennlp.predictors import Predictor
predictor = pretrained.load_predictor(model_id="structured-prediction-srl-bert")
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
my code to obtain arg1:
def extract_arg1(sentence):
result = []
try:
try:
output = predictor.predict(sentence)
except Exception as e:
print(e)
tokenized_sentence = tokenizer(sentence, max_length=500,
truncation=True,
padding=False,
add_special_tokens=False)
tokens = tokenized_sentence.tokens()
output = predictor.predict_tokenized(sentence)
for verb in output['verbs']:
desc = verb['description']
arg1_start = desc.find('ARG1: ')
if arg1_start > -1:
arg1_end = arg1_start + len('ARG1: ')
arg1 = desc[arg1_end: desc.find(']')]
result.append((verb['verb'], arg1))
return result
except Exception as e:
print(e)
return -1
#loop over all sentences
from tqdm.notebook import tqdm
tqdm.pandas()
df['Arg1'] = df.sentence.progress_apply(extract_arg1)
however, this process returns : [(affected, Our results)] but I need [(affected, the outcomes of litigation, legal proceedings and other legal or regulatory matters )]