Unclear on TypeError message from Google NL API running sentiment analysis

114 Views Asked by At

Goal

To run sentiment analysis on a column of text in a pandas dataframe, having it return both score and magnitude values for each line of text.

Current code

This is what I'm running, pulling in a dataframe (df03) with a column of text (text02) that I want to analyze.

# Imports the Google Cloud client library
from google.cloud import language_v1

# Instantiates a client
client = language_v1.LanguageServiceClient()

# The text to analyze
text = df03.loc[:,"text02"]
document = language_v1.Document(
    content=text, type_=language_v1.types.Document.Type.PLAIN_TEXT
)

# Detects the sentiment of the text
sentiment = client.analyze_sentiment(
    request={"document": document}
).document_sentiment

print("Text: {}".format(text))
print("Sentiment: {}, {}".format(sentiment.score, sentiment.magnitude))

And this is the returned error message

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-1c6f7c607084> in <module>()
      8 text = df03.loc[:,"text02"]
      9 document = language_v1.Document(
---> 10     content=text, type_=language_v1.types.Document.Type.PLAIN_TEXT
     11 )
     12 

/usr/local/lib/python3.7/dist-packages/proto/message.py in __init__(self, mapping, ignore_unknown_fields, **kwargs)
    562 
    563         # Create the internal protocol buffer.
--> 564         super().__setattr__("_pb", self._meta.pb(**params))
    565 
    566     def _get_pb_type_from_key(self, key):

TypeError: 01                          Max Muncy is great!
02               The worst Dodger is Max muncy.
03   has type Series, but expected one of: bytes, unicode

Assessment

The error message points to the line:

content=text, type_=language_v1.types.Document.Type.PLAIN_TEXT

The TypeError message attempts to explain what's happening:

has type Series, but expected one of: bytes, unicode

So it seems to recognize the list of text blurbs under the text column in dataframe df03, but apparently I failed to establish the right data type setting.

However, I'm not sure where I'm supposed to set the Type, as the only Document Type settings in the documentation appear to be HTML, PLAIN_TEXT, or TYPE_UNSPECIFIED. Of those, I'm pretty sure PLAIN_TEXT is right.

Documentation: https://googleapis.dev/python/language/latest/language_v1/types.html#google.cloud.language_v1.types.Document

So that leaves me unclear on what that error message is indicating or how I should approach troubleshooting.

Greatly appreciate any input on this.

doug

1

There are 1 best solutions below

4
On BEST ANSWER

It looks like Google's API can't handle a pandas Series directly, but expects you to pass one string at a time. Try applying a custom function to the DataFrame column which contains your text:

def get_sentiment(text):
    # The text to analyze
    document = language_v1.Document(
        content=text,
        type_=language_v1.types.Document.Type.PLAIN_TEXT
    )

    # Detects the sentiment of the text
    sentiment = client.analyze_sentiment(
        request={"document": document}
    ).document_sentiment

    return sentiment


df03["sentiment"] = df03["text02"].apply(get_sentiment)