Identify and remove certain hidden characters from a text file
I have a text file that contains some hidden characters. Using cat -v
, I can see that they include the following:
^ M
^ [[A
There are also characters at the end of the line \n
. I would also like to display them somehow.
Then I would like to be able to selectively cut
and sed
hide characters. How can I accomplish this?
I tried dos2unix
it but it didn't work to remove any of the symbols ^M
. I also tried sed s/^M//g
in which I pressed ctrl + vm .
Raw data
Derive from cat -v
raw data, also available at: http://pastebin.com/Vk2i81JC
^MCopying non-tried blocks... Pass 1 (forwards)^M^[[A^[[A^[[Arescued: 0 B, errsize: 0 B, current rate: 0 B/s
ipos: 0 B, errors: 0, average rate: 0 B/s
opos: 0 B, run time: 1 s, successful read: 1 s ago
^MFinished
Desired output
Also available at: http://pastebin.com/wfDnrELm
rescued: 0 B, errsize: 0 B, current rate: 0 B/s
ipos: 0 B, errors: 0, average rate: 0 B/s
opos: 0 B, run time: 1 s, successful read: 1 s ago
Finished
source to share
Try the command tr
that is used to translate or remove characters. The command below removes all characters other than those in the octal text inside the quotes
octal \ 12 is newline (\ n), octal \ 11 is TAB (^ I), octal \ 40- \ 176 are good characters.
For a complete reference to octal values see this page: https://courses.engr.illinois.edu/ece390/books/labmanual/ascii-code-table.html
tr -cd '\11\12\40-\176' < org.txt > new.txt
The file new.txt
will contain deleted characters.
To remove characters between ^ M and remove unnecessary control characters, use the following command
sed "s/\r.*\r//g" org.txt | tr -cd '\11\12\40-\176' > new.txt
source to share