Profanity (bad words) filter for real-time chat, elasticsearch?

726 Views Asked by At

I need to create a room based chat app, that needs to support thousands of users at the same time.

The problem is, client wants to filter bad words from the messages.

Standard profanity filter libraries would not be a problem to use if the number of the users is low, but in this case performance of the filtering is the most important thing as the chat is real-time.

Libraries that uses NLP are mostly trained with english language datasets, and it will also not work for me since i need to filter bad words for couple of langugages.

The only remaining thing that comes to my mind is ElasticSearch.

Is there any solution that is good enough and can support large numbers of the users, something that would work like this :

bad_words = [badword1, badword2, badword3...]

Input message : "You are badword1, and i badword2 you!"

Output message: "You are ***, and i *** you"

If the solution can also handle cases where, for the example "badword1" is written as "bad_word1" or something like that, it would be a bonus.

All suggestions are welcomed.

0

There are 0 best solutions below