What is the best way to implement an abusive word handler (.NET preferred)?
For an ASP.NET application, what is the Best Practice implementation method for custom manual deletion / replacement of a dictionary?
If this is a data table solution, is there a free resource to get the data? (Similar to looking up a public dictionary table that you can import into your system to check spelling)
source to share
The only way to win is not to play.
Consider the following sentence:
"Edward II was one of the few monarchs to give birth to a registered bastard."
Bastard is a dirty word on the border, but in this context it is a perfectly reasonable term.
Consider also:
- "Molten slag fell out of the cruciable."
- "The bitch snorted the back of the other dog."
You can never create a parser capable of working out correct usage. Even if you decide to go anyway and just run those words, they can still be undermined.
Ask yourself, is "Tw * t" really much less offensive than "twat"? Everyone knows which word you are pointing to, and everyone understands what that means.
Ultimately, the solution to this problem is not a technological one. Indeed, you want to use some human moderator to get rid of people who swear. The moderate man has a means that algorithms will never have: he can judge. Using this solution is much more rewarding than throwing computer science at the problem.
This is discussed in detail in another answer to this question.
source to share
The good thing we (*) did was create a two-level list of "bad words" (using a regex to hopefully catch some variations). Using the word Tier 1 will give you a warning that you are in violation of the Terms of Service and you cannot keep this message until you correct it. If you use a level 2 word, the message is sent, but an objection is automatically filed against it. All messages with a marked objection (generated by the system or by the user) are checked by the person who determined whether he stays or goes.
(*) "We" is the e-commerce arm of a large brick-and-mortar holding company that has just started allowing the creation of user generated content on this website.
source to share