How can I change backslashes like "\" and "\ 303 \ 266" in bash?

I have a script that writes files with UTF8 encoded names. However, the script encoding / environment was not set up correctly and it just re-encoded the raw bytes. Now I have many lines in the file:

.../My\ Folders/My\ r\303\266m/...

      

So file names have spaces with characters \

and UTF8 like \303\266

(which is ö

). Do I want to reverse this encoding? Is there a simple set of bash command line commands that I can chain together to remove them?

I could get millions of commands sed

, but it will take a long time to display all non-ASCII characters. Or start parsing it in python. But I hope I can do the trick.

+2


source to share


6 answers


In the end, I used something like this:

cat file | sed 's/%/%%/g' | while read -r line ; do printf "${line}\n" ; done | sed 's/\\ / /g'

      



Some of the files had in them %

which is a special printf character, so I had to "double it" so it was escaped and passed right through. -r

in read

stops reading, speeding up \

, but it does not turn the read "\ "

into " "

, so I needed the final one sed

.

+1


source


Here's a rough attack on Unicode characters:

text="/My\ Folders/My\ r\303\266m/"
text="echo \$\'"$(echo $text|sed -e 's|\\|\\\\|g')"\'"
text=$(eval "echo $(eval $text)")
read text < <(echo $text)
echo $text

      



In this case, the $'string'

Bash quoting function is used .

This outputs "/ My folders / My röm /".

+4


source


It is not clear what kind of acceleration is being used. The octal character codes are C, but C does not go out of space. Space whitespace is used in the shell, but it doesn't use octal escape characters.

Something close to C-style escaping can be canceled with the command printf %b $escaped

. (The documentation says octal escape sequences start with \0

, but that doesn't seem to be required by GNU printf.) Another answer mentions read

for unescaping shell escapes, although if space is the only one that is not processed printf %b

then handling that case with sed

. will probably be better.

+2


source


Use printf

utf-8 to solve the problem with text. Use read

for whitespace maintenance (\ )

.

Like this:

$ text='/My\ Folders/My\ r\303\266m/'
$ IFS='' read t < <(printf "$text")
$ echo "$t"
/My Folders/My röm/

      

+1


source


The built-in function 'read' will handle the Problem part:

$ echo "with \ spaces" | while read r; do echo $ r; done
with spaces
0


source


Pass the file (line by line) to the next perl script.

#!/usr/bin/per

sub encode {
    $String = $_[0];
    $_ = $String;
    while(/(\\[0-9]+|.)/g) {
        $Match = $1;

        if ($Match =~ /\\([0-9]+)/) {
            $Code = oct(0 + $1);
            $Char = ((($Code >= 32) && ($Code  160))
                ? chr($Code)
                : sprintf("\\x{%X}", $Code);
            printf("%s", $Char);
        } else {
            print "$Match";
        }
    }

    print "\n";
}

while ($#ARGV >= 0) {
    $File = shift();
    open(my $F, ") {
        $String =~ s/\\ / /g;
        &encode($Line);
    }
}

      

Like this:

$ ./PerlEncode.pl Test.txt

      

Where Test.txt is contained:

/My\ Folders/My\ r\303\266m/
/My\ Folders/My\ r\303\266m/
/My\ Folders/My\ r\303\266m/

      

The line "$ String = ~ s / \ // g;" replace "\" with "" and subcode the parsing of these unicode char.

Hope for this help

0


source







All Articles