What is the best way to implement an abusive word handler (.NET preferred)?

For an ASP.NET application, what is the Best Practice implementation method for custom manual deletion / replacement of a dictionary?

If this is a data table solution, is there a free resource to get the data? (Similar to looking up a public dictionary table that you can import into your system to check spelling)

+1


source to share


3 answers


+15


source


The only way to win is not to play.

Consider the following sentence:

"Edward II was one of the few monarchs to give birth to a registered bastard."

Bastard is a dirty word on the border, but in this context it is a perfectly reasonable term.

Consider also:



  • "Molten slag fell out of the cruciable."
  • "The bitch snorted the back of the other dog."

You can never create a parser capable of working out correct usage. Even if you decide to go anyway and just run those words, they can still be undermined.

Ask yourself, is "Tw * t" really much less offensive than "twat"? Everyone knows which word you are pointing to, and everyone understands what that means.

Ultimately, the solution to this problem is not a technological one. Indeed, you want to use some human moderator to get rid of people who swear. The moderate man has a means that algorithms will never have: he can judge. Using this solution is much more rewarding than throwing computer science at the problem.

This is discussed in detail in another answer to this question.

+6


source


The good thing we (*) did was create a two-level list of "bad words" (using a regex to hopefully catch some variations). Using the word Tier 1 will give you a warning that you are in violation of the Terms of Service and you cannot keep this message until you correct it. If you use a level 2 word, the message is sent, but an objection is automatically filed against it. All messages with a marked objection (generated by the system or by the user) are checked by the person who determined whether he stays or goes.

(*) "We" is the e-commerce arm of a large brick-and-mortar holding company that has just started allowing the creation of user generated content on this website.

+2


source







All Articles