OpenFire Content Filter using REGEX

1k Views Asked by At

Hi i am currently implementing the following regex to prevent user submitting contents which contains profanity as describe within the regex

(?i)(pecan|tie|shirt|hole|ontology|meme|pelagic|cock|duck|slot|anjing lo|Banting|Chiba|Screw|Screwing|fat|where|mother|peer|per|sock|socker|locker|ans|rect|anal|pickpocket|joker|muck)\b

I would like to improve the regex so it also filter out credit card number (master, visa, jcb, amex and so on)

i have the regex for each card namely:

 ^4[0-9]{12}(?:[0-9]{3})?$ (Visa)
^5[1-5][0-9]{14}$ (Master)
^3[47][0-9]{13}$ (Amex)
^3(?:0[0-5]|[68][0-9])[0-9]{11}$ (Diners)
^6(?:011|5[0-9]{2})[0-9]{12}$ (Discover)
^(?:2131|1800|35\d{3})\d{11}$ (JCB)

However when i combine these credit card amex along with the profanity filter like this

(?i)(pecan|tie|shirt|hole|ontology|meme|pelagic|cock|duck|slot|anjing lo|Banting|Chiba|Screw|Screwing|fat|where|mother|peer|per|sock|socker|locker|ans|rect|anal|pickpocket|joker|muck)\b (?i)^4[0-9]{12}(?:[0-9]{3})?$\b (?i)^5[1-5][0-9]{14}$\b it will ignore the profanity filter.

Can anyone points me to the right direction?

2

There are 2 best solutions below

1
On

Filtering profanity is a great example when NOT to use regex!... Anyone who wants to swear can easily get around your filter by typing "0" instead of "o", or inserting a "." in the middle of a word, or hundreds of other workarounds. There are much better alternatives out there, if you'd like to do some research. Anyway, ignoring that...

Firstly, do you really need to do this in a single regex pattern?! Your code would look much more readable and be more easily maintainable if you split this into multiple lines of code.

But if you really insist on doing it this way, your pattern is looking for a swear word, followed by a Visa number, followed by a Master number. You have not implemented any "OR" condition here.

6
On

This is one of the stupidest policy requirement I've ever seen. Your filter will miss a lot of profanities, and will trigger on non-profanities; see Scunthorpe problem.

Then, your credit card regexes already exclude all possible swearwords because they allow only digits, out of which it is going to be difficult to construct a swearword.

But if your boss insists, make him happy with

(?i)^(?!.*(pecun|tai|shit|asshole|kontol|memek|pelacur|cock|dick|slut|anjing lo|bangsat|cibay|fuck|fucking|faggot|whore|motherfucker|peler|pler|suck|sucker|fucker|anus|rectum|anal|cocksucker|sucker|suck)\b)4[0-9]{12}(?:[0-9]{3})?$