I am trying to return the entities for each sentence in corpus using Watson Natural Language Understanding.
(I can't produce fully reproducible code because the dataset I'm using is private.)
My script looks like this:
import json
from watson_developer_cloud import NaturalLanguageUnderstandingV1
from watson_developer_cloud.natural_language_understanding_v1 import Features, EntitiesOptions
import pandas as pd
from utils import *
_DATA_PATH = "data/example_data.csv"
_IBM_NLU_USERNAME = "<username>"
_IBM_NLU_PASSWORD = "<password>"
X = [string1, string2, ... ]
nlu = NaturalLanguageUnderstandingV1(username=_IBM_NLU_USERNAME,
password=_IBM_NLU_PASSWORD,
version="2018-03-16")
def ibm_ner_recognition(sentence):
"""
Input -- sentence, string to conduct NER on
-- feats, this will always be entities
Return -- list of entities in the sentence
"""
response = nlu.analyze(text=sentence,
features=Features(entities=EntitiesOptions()))
output = json.loads(json.dumps(response))
entities = []
for result in output["entities"]:
entities.append(result["type"])
return entities
entities = []
for sent in X:
sent_entities = ibm_ner_recognition(sent)
entities.append(sent_entities)
This works fine up to roughly the 400th sentence in the corpus, and then throws up the following error:
WatsonApiException Traceback (most recent call last)
<ipython-input-57-8632adb20778> in <module>()
5 for sent in X:
6 print(sent)
----> 7 sent_entities = ibm_ner_recognition(sent)
8 entities.append(sent_entities)
9 end = t.default_timer()
<ipython-input-13-d132b33efc76> in ibm_ner_recognition(sentence)
9 """
10 response = nlu.analyze(text=sentence,
---> 11
features=Features(entities=EntitiesOptions()))
12 output = json.loads(json.dumps(response))
13 entities = []
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/watson_developer_cloud/natural_language_understanding_v1.py in analyze(self, features, text, html, url, clean, xpath, fallback_to_raw, return_analyzed_text, language, limit_text_characters, **kwargs)
202 params=params,
203 json=data,
--> 204 accept_json=True)
205 return response
206
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/watson_developer_cloud/watson_service.py in request(self, method, url, accept_json, headers, params, json, data, files, **kwargs)
446 error_info = self._get_error_info(response)
447 raise WatsonApiException(response.status_code, error_message,
--> 448 info=error_info, httpResponse=response)
WatsonApiException: Error: Server Error cannot analyze: downstream issue, Code: 500 , X-dp-watson-tran-id: gateway01-474786453 , X-global-transaction-id: 7ecac92c5aff58601c4caa95
I located the string that was causing this as the following:
s = "Liz Saville Roberts."
I decided to run this string through ibm_ner_recognition
on its own. There was no error and it successfully captured entities.
Problem summary
When looping through my corpus, I am getting to a sentence where Watson NLU is giving me a downstream error. However, Watson NLU is successful when receiving that sentence on its own, outside of the loop.
Things to note
This is not the last sentence in my corpus.
I haven't run out of uses of Watson according to my payment plan.
Edit
I re-ran the loop again. This time the sentence above worked (the loop reached about 3500th string), so there seems to be some temperamental behaviour.