Best data cleansing practices for IBM Personality Insights

103 Views Asked by At

I am testing out Personality Insights and I am curious whether I need to do any data cleansing prior to sending a string of twitter profile's timeline across to IBM.

For example, should I remove urls included in the tweets and other twitter features like hashtags or profile names included in the single tweet.

I am currently not removing any data. However, I am currently concatenating tweets with a full stop and a space using text+=". "+tweetfulltext.

1

There are 1 best solutions below

0
On

You don't need to but as they don't count towards the personality then if you already have a cleanup module it will help with the word count. You will want to filter to remove retweets.