Separating a sentence word by word with JavaScript (client)

450 Views Asked by At

I'm trying to separate a sentence word by word but it seems like it is a very hard task with JavaScript. I can't simply separate the sentence by looking at the whitespace. Because there are languages (Thai, Chinese, Japanese, etc.) that don't use whitespace to separate words. Therefore a dictionary-based algorithm seems like the way to go. However, the dictionaries have a large size and I'm trying to separate the sentence on the client.

Java has a BreakIterator class that allows you to iterate through the words in the sentence. That's exactly what I need but JS doesn't have the same functionality. Chrome has Intl.v8BreakIterator but I'm looking for a solution for all major browsers.

There is a proposal, Intl.Segmenter, that would solve the issue. It's basically BreakIterator on Javascript. But it wasn't released yet.

If there is way, can you please point me in the right direction?

1

There are 1 best solutions below

7
On

It seems you may have to use the spread operator:

const text = '中國是最古老的文明';
const splitString = [...text];
console.log(splitString);

But then again, I'm not too sure if that's what you're trying to do since I'm not sure what the Chinese language/characters mean/read. But I read this somewhere a while ago.