If I call the function analyse_syntax from Google Cloud Python library, it returns
document = types.Document(content='Tried this', type=enums.Document.Type.PLAIN_TEXT)
info = client.analyze_syntax(document=document)
print(info)
sentences {
text {
content: "Tried this"
begin_offset: -1
}
}
tokens {
text {
content: "Tried"
begin_offset: -1
}
part_of_speech {
tag: VERB
mood: INDICATIVE
tense: PAST
}
dependency_edge {
label: ROOT
}
lemma: "try"
}
tokens {
text {
content: "this"
begin_offset: -1
}
part_of_speech {
tag: DET
number: SINGULAR
}
dependency_edge {
label: DOBJ
}
lemma: "this"
}
language: "en"
print(info.tokens)
[text {
content: "Tried"
begin_offset: -1
}
part_of_speech {
tag: VERB
mood: INDICATIVE
tense: PAST
}
dependency_edge {
label: ROOT
}
lemma: "try"
, text {
content: "this"
begin_offset: -1
}
part_of_speech {
tag: DET
number: SINGULAR
}
dependency_edge {
label: DOBJ
}
lemma: "this"
]
print(info.tokens[0].part_of_speech)
tag: VERB
mood: INDICATIVE
tense: PAST
which is a weird format to me, because:
I can't iterate by (what looks like) keys
for key in info.tokens[0].part_of_speech:
givesTypeError: 'PartOfSpeech' object is not iterable
.Accessing the values doesn't work like I thought:
info.tokens[0].part_of_speech.tag
gives the value11
.
QUESTION: What type of object is that and how does it work?
I wanted to be able to convert it to a dictionary (in a better way than converting it to string first) or iterate through it somehow (find which keys it has and there corresponding values).
First thing you can do in order to get the type of an object in python is to call built-in function type()
Which you will see returns as output
This is, a class defined in the own google cloud NLP library.
Something you can also do is to see what attributes this class has by using also built-in function dir():
That results in:
As you can see, this object seems to have as attributes all of the possible keys and values of what could look as something as a dictionary. If you further inspect the attributes like 'VERB' or 'tag' you will see all of them are integers. The way this object stores information is by mattching the key integer to the value integer, that is why 'tag' returns '11', because that is precisely the integer associated to 'VERB' (you can check this also with 'mood' and 'INDICATIVE' (both are '3') and 'tense' and 'PAST' (both are also '3')). On the contrary, those keys that don't have an associated value (like 'person', or 'gender') are given the value of 0.
Now, coming back to a way of iterating this item, you can see that the string returned when you call 'part_of_speech_0' has a YAML like structure. you can thus turn this into a dictionary by loading it using the yaml module in python. Here's the final complete code that woud output the iteration of (key, value) pairs in 'part_of_speech':