Net :: Telnet - puts or prints a string in UTF-8

I am using an API where I have to send client data as Json object via telnet connection (very strange, I know ^^). I am German, so customer information very often contains umlauts or ß.

My procedure:

  • I am generating a hash with all the information about the team.
  • I am converting Hash to Json object.
  • I am converting Json object to string (s .to_s

    ).
  • I am sending a string using the Net :: Telnet.puts command.

My command puts

looks like this: (cmd is a Json object)

host.puts(cmd.to_s.force_encoding('UTF-8'))

      

In the log files I see that the Json object does not contain umlauts, but for example this: ü

instead of ü

.

I have proven that the string (with or without command force_encoding()

) is in UTF-8. So I think the command is puts

not sending strings in UTF-8.

Can I send a command in UTF-8? How can i do this?

Whole methods:

host = Net::Telnet::new(
    'Host' => host_string,
    'Port' => port_integer,
    'Output_log' => 'log/'+Time.now.strftime('%Y-%m-%d')+'.log',
    'Timeout' => false,
    'Telnetmode' => false,
    'Prompt' => /\z/n
)

def send_cmd_container(host, cmd, params=nil)
    cmd = JSON.generate({'*C'=>'se','Q'=>[get_cmd(cmd, params)]})
    host.puts(cmd.to_s.force_encoding('UTF-8'))
    add_request_to_logfile(cmd)
end

def get_cmd(cmd, params=nil)
    if params == nil
        return {'*C'=>'sq','CMD'=>cmd}
    else
        return {'*C'=>'sq','CMD'=>cmd,'PARAMS'=>params}
    end
end

      

Addition:

I also log my requests using this method:

def add_request_to_logfile(request_string)
    directory = 'log/'
    File.open(File.join(directory, Time.now.strftime('%Y-%m-%d')+'.log'), 'a+') do |f|
        f.puts ''
        f.puts '> '+request_string
    end
end

      

In the log file, my requests do not contain UTF-8 umlauts either, but for example this: ü

+3


source to share


1 answer


TL; DR

Install 'Binmode' => true

and use Encoding::BINARY

.

The above should work for you. If you're wondering why, read on.


Telnet has no concept of "encoding". Telnet has two modes: in normal mode, you send 7-bit ASCII characters, and in binary mode, you send 8-bit bytes. You can't tell Telnet "it's UTF-8" because Telnet doesn't know what that means. You can say it's "ASCII-7" or "it's a sequence of 8-bit bytes" and what it is.

This might sound like bad news, but it's actually great news because it just so happens that UTF-8 encodes text as sequences of 8-bit bytes. früh

For example, the five bytes: 66 72 c3 bc 68

. This can be easily verified in Ruby:

puts str = "\x66\x72\xC3\xBC\x68"
# => früh
puts str.bytes.size
# => 5

      

In Net :: Telnet, we can enable binary mode by passing a parameter 'Binmode' => true

to Net::Telnet::new

. But one more thing we have to do: Tell Ruby to treat the string as binary data, i.e. A sequence of 8-bit bytes.

You already tried to use String#force_encoding

, but what you may not have understood is that String#force_encoding

it doesn't actually convert a string from one encoding to another. Its purpose is not to change the encoding of the data, but to tell Ruby that it is already encoding the data:

str = "früh"   # => "früh"
p str.encoding # => #<Encoding:UTF-8>
p str[2]       # => "ü"

p str.bytes    # => [ 102, 114, 195, 188, 104 ] # This is the decimal represent-
                                                # ation of the hexadecimal bytes
                                                # we saw before, `66 72 c3 bc 68`

str.force_encoding(Encoding::BINARY) # => "fr\xC3\xBCh"
p str[2]       # => "\xC3"

p str.bytes    # => [ 102, 114, 195, 188, 104 ] # Same bytes!

      

Now I'll give you a little secret: Encoding::BINARY

- it's just an alias for Encoding::ASCII_8BIT

. Since the ASCII-8BIT has multibyte characters, Ruby shows ü

as two separate bytes \xC3\xBC

. These bytes are not printable characters in ASCII-8BIT, so Ruby displays the escape codes \x##

, but the data doesn't change - just how Ruby prints it.



So here's the thing: even though Ruby now calls a BINARY or ASCII-8BIT string instead of UTF-8, it's still the same bytes, which means it's still UTF-8. However, the encoding change it is "marked" as it does when Net :: Telnet does (equivalent) data[n]

, it will always receive one byte (instead of potentially receiving multibyte characters like in UTF-8), which is just what we want.

So...

host = Net::Telnet::new(
         # ...all of your other options...
         'Binmode' => true
       )

def send_cmd_container(host, cmd, params=nil)
  cmd = JSON.generate('*C' => 'se','Q' => [ get_cmd(cmd, params) ])
  cmd.force_encoding(Encoding::BINARY)
  host.puts(cmd)
  # ...
end

      

(Note: JSON.generate

always returns a UTF-8 string, so you don't have to do, for example cmd.to_s

.)

Useful diagnostics

A quick way to check what data Net :: Telnet is actually sending (and receiving) is to set a parameter 'Dump_log'

(similar to setting a parameter 'Output_log'

). It will write both sent and received data to a hexdump log file, allowing you to see if the bytes were sent correctly. For example, I started a test server ( nc -l 5555

) and sent a string früh

( host.puts "früh".force_encoding(Encoding::BINARY)

) and this is what was logged:

> 0x00000: 66 72 c3 bc  68 0a                                  fr..h.

      

You can see that it sent six bytes: the first two are f

and r

, the next two make up ü

, and the last two make h

up a newline. On the right, bytes that are not printable characters are shown as .

, ergo fr..h.

. (Likewise, I sent the string I❤NY

and saw I...NY.

in the right column because

- three bytes in UTF-8 :) e2 9d a4

.

So, if you've installed 'Dump_log'

and uploaded ü

, you should see c3 bc

in the output. If so, congratulations, you are sending UTF-8!

PS Read Yehuda Katz Ruby 1.9 Encodings: A Primer and Solution for Rails . In fact, read it annually. It's really, really helpful.

+3


source







All Articles