Identify and remove certain hidden characters from a text file

Question

Identify and remove certain hidden characters from a text file

I have a text file that contains some hidden characters. Using cat -v

, I can see that they include the following:

^ M

^ [[A

There are also characters at the end of the line \n

. I would also like to display them somehow.

Then I would like to be able to selectively cut

and sed

hide characters. How can I accomplish this?

I tried dos2unix

it but it didn't work to remove any of the symbols ^M

. I also tried sed s/^M//g

in which I pressed ctrl + vm .

Raw data

Derive from cat -v

raw data, also available at: http://pastebin.com/Vk2i81JC

^MCopying non-tried blocks... Pass 1 (forwards)^M^[[A^[[A^[[Arescued:         0 B,  errsize:       0 B,  current rate:        0 B/s
   ipos:         0 B,   errors:       0,    average rate:        0 B/s
   opos:         0 B, run time:       1 s,  successful read:       1 s ago
^MFinished

Desired output

Also available at: http://pastebin.com/wfDnrELm

rescued:         0 B,  errsize:       0 B,  current rate:        0 B/s
   ipos:         0 B,   errors:       0,    average rate:        0 B/s
   opos:         0 B, run time:       1 s,  successful read:       1 s ago
Finished

+3

unix bash sed

p014k 11 Sep '14 at 3:31

source to share

1 answer

Ram · Accepted Answer · 2014-09-11T04:59:15+0000

Try the command tr

that is used to translate or remove characters. The command below removes all characters other than those in the octal text inside the quotes

octal \ 12 is newline (\ n), octal \ 11 is TAB (^ I), octal \ 40- \ 176 are good characters.

For a complete reference to octal values see this page: https://courses.engr.illinois.edu/ece390/books/labmanual/ascii-code-table.html

tr -cd '\11\12\40-\176' < org.txt > new.txt

The file new.txt

will contain deleted characters.

To remove characters between ^ M and remove unnecessary control characters, use the following command

sed "s/\r.*\r//g" org.txt | tr -cd '\11\12\40-\176' > new.txt

Identify and remove certain hidden characters from a text file

Raw data

Desired output

More articles: