Semantic Role Labeling tag issue

181 Views Asked by At

I am trying to extract causal arguments at the sentence level. so far, my code works but somehow returns the wrong arguments.

Such that: SRL demo for sentence 'Our results may be materially adversely affected by the outcomes of litigation, legal proceedings and other legal or regulatory matters.'

the causing argument is " the outcomes of litigation, legal proceedings and other legal or regulatory matters " and this corresponds to A1 (aka Arg1).

#requirements:
from allennlp.predictors import Predictor
predictor = pretrained.load_predictor(model_id="structured-prediction-srl-bert")
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

my code to obtain arg1:

def extract_arg1(sentence):
  result = []
  try:
    try:
      output = predictor.predict(sentence)
    except Exception as e:
      print(e)
      tokenized_sentence = tokenizer(sentence, max_length=500, 
                                    truncation=True, 
                                    padding=False, 
                                    add_special_tokens=False)
      tokens = tokenized_sentence.tokens()
      output = predictor.predict_tokenized(sentence)
    for verb in output['verbs']:
      desc = verb['description']
      arg1_start = desc.find('ARG1: ')
      if arg1_start > -1:
        arg1_end = arg1_start + len('ARG1: ')
        arg1 = desc[arg1_end: desc.find(']')]
        result.append((verb['verb'], arg1))
    return result
  except Exception as e:
    print(e)
    return -1


#loop over all sentences
from tqdm.notebook import tqdm
tqdm.pandas()

df['Arg1'] = df.sentence.progress_apply(extract_arg1)

however, this process returns : [(affected, Our results)] but I need [(affected, the outcomes of litigation, legal proceedings and other legal or regulatory matters )]

0

There are 0 best solutions below