Python Alternative (Equivalent) to Wink Tokenizer JS

167 Views Asked by At

I have some JS functions that have helped me to tokenize my strings using Wink Tokenizer.

I'm moving some services to Python and now I would like to get an equivalent tokenizer function. I have researched a lot and it seems Wink tokenizer is just available for JS. I'm also not that aware of the subtle differences between Wink and other Python Tokenizers like spaCY for example.

Basically I would like to be able to get the same results as:

var tokenizer = require( 'wink-tokenizer' );
// Create it's instance.
var myTokenizer = tokenizer();
 
// Tokenize a tweet.
var s = '@superman: hit me up on my email [email protected], 2 of us plan party tom at 3pm:) #fun';
myTokenizer.tokenize( s );

On Python

Can anyone help me out by pointing me in the right direction of how I could go on replicating the tokenization functions Wink offers on Python? What parameters, configs, regexes do I have to check to get an equivalent behaviout?

1

There are 1 best solutions below

2
On

There are many ways. Python has a rich data science community. There are many NLP packages. Here is a reasonable list of easy to implement ways to tokenize text:

https://towardsdatascience.com/5-simple-ways-to-tokenize-text-in-python-92c6804edfc4

I personally use https://github.com/stanfordnlp/stanza

All of these resources were on the first page in google for "python" "tokenization"