How do I find non-printable characters in a file?
I tried to figure out non-printable characters in a data file on unix. Code:
#!/bin/ksh
export SRCFILE='/data/temp1.dat'
while read line
do
len=lenght($line)
for( $i = 0; $i < $len; $i++ ) {
if( ord(substr($line, $i, 1)) > 127 )
{
print "$line\n";
last;
}
done < $SRCFILE
The code is not working, please help me in getting a solution for the above request.
source to share
You can use grep
to find non-printable characters in a file, something like the following, which finds all non-printable-ASCII and all non-ASCII:
grep -P -n "[\x00-\x1F\7F-\xFF]" input_file
-P
gives you more powerful Perl Regular Expressions (PCRE) and -n
shows you line numbers.
If yours grep
doesn't support PCRE, I would just use Perl to do this:
perl -ne '$x++;if($_=~/[\x00-\x1F\x7F-\xFF]/){print"$x:$_"}' input_file
source to share
It sounds pretty trite, but I was not sure how to do it now. I love "od" depending on what you are doing you may want something suitable for printing arbitrary characters. The awk code is not very elegant, but it is flexible if you are looking for specifics, however, the purpose is simply to show the use of od. Note the problems with awk, compares and spaces, etc.
cat filename | od -A n -t x1z | awk '{ p=0; i=1; if ( NF>16) { while (i<17) {if ( $i!="0d"){ if ( $i!="0a") {if ( $i" " < "20 " ) {print $i ; p=1;} if ( $i" "> "7f "){print $i; p=1;}}} i=i+1} if (p==1) print $0; }}' | more
source to share