How can I change backslashes like "\" and "\ 303 \ 266" in bash?
I have a script that writes files with UTF8 encoded names. However, the script encoding / environment was not set up correctly and it just re-encoded the raw bytes. Now I have many lines in the file:
.../My\ Folders/My\ r\303\266m/...
So file names have spaces with characters \
and UTF8 like \303\266
(which is ö
). Do I want to reverse this encoding? Is there a simple set of bash command line commands that I can chain together to remove them?
I could get millions of commands sed
, but it will take a long time to display all non-ASCII characters. Or start parsing it in python. But I hope I can do the trick.
source to share
In the end, I used something like this:
cat file | sed 's/%/%%/g' | while read -r line ; do printf "${line}\n" ; done | sed 's/\\ / /g'
Some of the files had in them %
which is a special printf character, so I had to "double it" so it was escaped and passed right through. -r
in read
stops reading, speeding up \
, but it does not turn the read "\ "
into " "
, so I needed the final one sed
.
source to share
Here's a rough attack on Unicode characters:
text="/My\ Folders/My\ r\303\266m/"
text="echo \$\'"$(echo $text|sed -e 's|\\|\\\\|g')"\'"
text=$(eval "echo $(eval $text)")
read text < <(echo $text)
echo $text
In this case, the $'string'
Bash quoting function is used .
This outputs "/ My folders / My röm /".
source to share
It is not clear what kind of acceleration is being used. The octal character codes are C, but C does not go out of space. Space whitespace is used in the shell, but it doesn't use octal escape characters.
Something close to C-style escaping can be canceled with the command printf %b $escaped
. (The documentation says octal escape sequences start with \0
, but that doesn't seem to be required by GNU printf.) Another answer mentions read
for unescaping shell escapes, although if space is the only one that is not processed printf %b
then handling that case with sed
. will probably be better.
source to share
Pass the file (line by line) to the next perl script.
#!/usr/bin/per
sub encode {
$String = $_[0];
$_ = $String;
while(/(\\[0-9]+|.)/g) {
$Match = $1;
if ($Match =~ /\\([0-9]+)/) {
$Code = oct(0 + $1);
$Char = ((($Code >= 32) && ($Code 160))
? chr($Code)
: sprintf("\\x{%X}", $Code);
printf("%s", $Char);
} else {
print "$Match";
}
}
print "\n";
}
while ($#ARGV >= 0) {
$File = shift();
open(my $F, ") {
$String =~ s/\\ / /g;
&encode($Line);
}
}
Like this:
$ ./PerlEncode.pl Test.txt
Where Test.txt is contained:
/My\ Folders/My\ r\303\266m/
/My\ Folders/My\ r\303\266m/
/My\ Folders/My\ r\303\266m/
The line "$ String = ~ s / \ // g;" replace "\" with "" and subcode the parsing of these unicode char.
Hope for this help
source to share