Identify and remove certain hidden characters from a text file

I have a text file that contains some hidden characters. Using cat -v

, I can see that they include the following:

^ M

^ [[A

There are also characters at the end of the line \n

. I would also like to display them somehow.

Then I would like to be able to selectively cut

and sed

hide characters. How can I accomplish this?

I tried dos2unix

it but it didn't work to remove any of the symbols ^M

. I also tried sed s/^M//g

in which I pressed ctrl + vm .


Raw data

Derive from cat -v

raw data, also available at: http://pastebin.com/Vk2i81JC

^MCopying non-tried blocks... Pass 1 (forwards)^M^[[A^[[A^[[Arescued:         0 B,  errsize:       0 B,  current rate:        0 B/s
   ipos:         0 B,   errors:       0,    average rate:        0 B/s
   opos:         0 B, run time:       1 s,  successful read:       1 s ago
^MFinished

      

Desired output

Also available at: http://pastebin.com/wfDnrELm

rescued:         0 B,  errsize:       0 B,  current rate:        0 B/s
   ipos:         0 B,   errors:       0,    average rate:        0 B/s
   opos:         0 B, run time:       1 s,  successful read:       1 s ago
Finished

      

+3


source to share


1 answer


Try the command tr

that is used to translate or remove characters. The command below removes all characters other than those in the octal text inside the quotes

octal \ 12 is newline (\ n), octal \ 11 is TAB (^ I), octal \ 40- \ 176 are good characters.

For a complete reference to octal values ​​see this page: https://courses.engr.illinois.edu/ece390/books/labmanual/ascii-code-table.html

tr -cd '\11\12\40-\176' < org.txt > new.txt

      



The file new.txt

will contain deleted characters.

To remove characters between ^ M and remove unnecessary control characters, use the following command

sed "s/\r.*\r//g" org.txt | tr -cd '\11\12\40-\176' > new.txt

      

+4


source







All Articles