What is the correct way to strip profane words from a string given:
1) I have a list of 100 words to look for in an array of strings.
2) What is the correct way to handle partial words? How do most people handle this? For example the word mass. Then sometimes a partial word is also bad - assume foobar is an extremely profane word I may want to disallow foobar and foobar* and *foobar.
So do you put all the words into a single expression or loop through the list?
What's the right way to tackle it? I'm using Groovy/Grails but any modern languages examples welcome.
(foobar|foobaz|...)Then put guards on either side of the grouping for extraneous characters
[^!@#$%^&*]*(foobar|foobaz|foofii)[^!@#$%^&*]*Also, you'll probably want to use a case insensitive flag so that it'll also match words like FooBaz and fOObaR.
As far as performance goes, concatenating this as one big regex is probably fastest (although I'm not an expert). The regex algorithm is pretty efficient at searching & handling branch conditions. Basically, it must be better than
O(mn)(wheremis the number of words andnis the size of the text you're searching)