I've been trying to create a censor system for the WoW emulator called TrinityCore for a while now. What I basically do is fill a database table (chat_filter) with 'bad words', fill a vector with these on startup and on every chat line that is made by a player, it gets checked against the content of my vector. If it contains a bad word, this gets replaced by ** (whereas the amount of *'s is also going to be taken from a column from the database table (todo)) and the player gets a punishment (muted or so).
Now what I'm having trouble with, is how to make a proper filter. Right now you'd have to add every possible combination of a word you can think of, for example 'a.s.s.' should also be read as 'ass', and I have no idea how to do this!
Here's the important part of the current code, I left out the DB pulling as it wouldn't have any use anyway (and it'd make it less clear as it's in a different file).
char* msg3 = strdup(msg.c_str());
char* words = strtok(msg3, " ,.-()&^%$#@!{}'<>/?|\\=+-_1234567890"); // This splits the sentence in seperated words and removes the symbols
ObjectMgr::ChatFilterContainer const& censoredWords = sObjectMgr->GetCensoredWords();
while (words != NULL && !censoredWords.empty())
{
for (uint32 i = 0; i < censoredWords.size(); ++i)
{
if (!stricmp(censoredWords[i].c_str(), words))
{
sLog->outString("%s", words);
//msg.replace(msg.begin(), msg.end(), msg.c_str(), "***");
msg.replace(msg.begin(), msg.end(), censoredWords[i].c_str(), '*');
}
//msg.replace(msg.begin(), msg.end(), censoredWords[i].c_str(), /*replacement*/ "***");
//msg.replace(msg.find(censoredWords[i].c_str()), censoredWords.size(),
}
words = strtok(NULL, " ,.-()&^%$#@!{}'<>/?|\=+-_1234567890");
}
Thanks in advance,
Jasper
P.S. 'GetCensoredWords' returns the vector.
P.S.S. 'msg' is a std::string - it's the ACTUAL message the player sent.
I would use
std::string
notchar*
so the memory management is all automatic. That would solve the problem of leaking memory in your example code. Boost.Algorithm provides a powerfulboost::algorithm::split
function which is much better thanstrtok
.It's not feasible to store every possible permutation of censored word, especially if you're going to loop over the whole set of words for every input. If you want to censor "fubar" you'd have to store "Fubar" and "FUbar" and FuBaR" and "fub4r" and "F.U.B.A.R" and "f.u.b.a.r" etc. etc.
Instead you could store each censored word only once, in a normalised form, e.g. "fubar", then convert each word of input to the normalised form. So if the user enters "F-u-B-a-R" you normalise it to "fubar" then you can do a simple lookup into the set of censored words (which can use an associate container so the lookup is O(log n) or even O(1))