The problem with a simple PHP profanation filter
I am writing a simple profanity filter in PHP. Can anyone tell me why in the following code the filter works (it will print [explicit]) for the $ vowels array and not the $ lines array that I am creating from a text file?
function clean($str){
$handle = fopen("badwords.txt", "r");
if ($handle) {
while (!feof($handle)) {
$array[] = fgets($handle, 4096);
}
fclose($handle);
}
$vowels = array("a", "e", "i", "o", "u", "A", "E", "I", "O", "U");
$filter = "[explicit]";
$clean = str_replace($array, $filter, $str);
return $clean;
}
When using $ vowels instead of $ array, it works, except for the lowercase vowels that are returned:
[[expl[explicit]c[explicit]t]xpl[explicit]c[explicit]t]
instead of
[explicit]
Not sure why this is happening.
Any ideas?
Thank!
source to share
I modified Davethegr8's solution to get the following working example:
function clean($str){
global $clean_words;
$replacement = '[explicit]';
if(empty($clean_words)){
$badwords = explode("\n", file_get_contents('badwords.txt'));
$clean_words = array();
foreach($badwords as $word) {
$clean_words[]= '/(\b' . trim($word) . '\b)/si';
}
}
$out = preg_replace($clean_words, $replacement, $str);
return $out;
}
source to share
Make sure you read:
Coding Horror: Filters of Obscenity: Bad Idea or Incredible Interdependence of Bad Idea?
before you decide to continue the string replacement path ...
source to share
First, file_get_contents is a much simpler function to read a file into a variable.
$badwords = explode("\n", file_get_contents('badwords.txt');
Second, preg_replace offers much more flexible string replacement options. - http://us3.php.net/preg_replace
foreach($badwords as $word) {
$patterns[] = '/'.$word.'/';
}
$replacement = '[explicit]';
$output = preg_replace($patterns, $replacement, $input);
source to share