Net :: Telnet - puts or prints a string in UTF-8
I am using an API where I have to send client data as Json object via telnet connection (very strange, I know ^^). I am German, so customer information very often contains umlauts or ß.
My procedure:
- I am generating a hash with all the information about the team.
- I am converting Hash to Json object.
- I am converting Json object to string (s
.to_s
). - I am sending a string using the Net :: Telnet.puts command.
My command puts
looks like this: (cmd is a Json object)
host.puts(cmd.to_s.force_encoding('UTF-8'))
In the log files I see that the Json object does not contain umlauts, but for example this: ü
instead of ü
.
I have proven that the string (with or without command force_encoding()
) is in UTF-8. So I think the command is puts
not sending strings in UTF-8.
Can I send a command in UTF-8? How can i do this?
Whole methods:
host = Net::Telnet::new(
'Host' => host_string,
'Port' => port_integer,
'Output_log' => 'log/'+Time.now.strftime('%Y-%m-%d')+'.log',
'Timeout' => false,
'Telnetmode' => false,
'Prompt' => /\z/n
)
def send_cmd_container(host, cmd, params=nil)
cmd = JSON.generate({'*C'=>'se','Q'=>[get_cmd(cmd, params)]})
host.puts(cmd.to_s.force_encoding('UTF-8'))
add_request_to_logfile(cmd)
end
def get_cmd(cmd, params=nil)
if params == nil
return {'*C'=>'sq','CMD'=>cmd}
else
return {'*C'=>'sq','CMD'=>cmd,'PARAMS'=>params}
end
end
Addition:
I also log my requests using this method:
def add_request_to_logfile(request_string)
directory = 'log/'
File.open(File.join(directory, Time.now.strftime('%Y-%m-%d')+'.log'), 'a+') do |f|
f.puts ''
f.puts '> '+request_string
end
end
In the log file, my requests do not contain UTF-8 umlauts either, but for example this: ü
source to share
TL; DR
Install 'Binmode' => true
and use Encoding::BINARY
.
The above should work for you. If you're wondering why, read on.
Telnet has no concept of "encoding". Telnet has two modes: in normal mode, you send 7-bit ASCII characters, and in binary mode, you send 8-bit bytes. You can't tell Telnet "it's UTF-8" because Telnet doesn't know what that means. You can say it's "ASCII-7" or "it's a sequence of 8-bit bytes" and what it is.
This might sound like bad news, but it's actually great news because it just so happens that UTF-8 encodes text as sequences of 8-bit bytes. früh
For example, the five bytes: 66 72 c3 bc 68
. This can be easily verified in Ruby:
puts str = "\x66\x72\xC3\xBC\x68"
# => früh
puts str.bytes.size
# => 5
In Net :: Telnet, we can enable binary mode by passing a parameter 'Binmode' => true
to Net::Telnet::new
. But one more thing we have to do: Tell Ruby to treat the string as binary data, i.e. A sequence of 8-bit bytes.
You already tried to use String#force_encoding
, but what you may not have understood is that String#force_encoding
it doesn't actually convert a string from one encoding to another. Its purpose is not to change the encoding of the data, but to tell Ruby that it is already encoding the data:
str = "früh" # => "früh"
p str.encoding # => #<Encoding:UTF-8>
p str[2] # => "ü"
p str.bytes # => [ 102, 114, 195, 188, 104 ] # This is the decimal represent-
# ation of the hexadecimal bytes
# we saw before, `66 72 c3 bc 68`
str.force_encoding(Encoding::BINARY) # => "fr\xC3\xBCh"
p str[2] # => "\xC3"
p str.bytes # => [ 102, 114, 195, 188, 104 ] # Same bytes!
Now I'll give you a little secret: Encoding::BINARY
- it's just an alias for Encoding::ASCII_8BIT
. Since the ASCII-8BIT has multibyte characters, Ruby shows ü
as two separate bytes \xC3\xBC
. These bytes are not printable characters in ASCII-8BIT, so Ruby displays the escape codes \x##
, but the data doesn't change - just how Ruby prints it.
So here's the thing: even though Ruby now calls a BINARY or ASCII-8BIT string instead of UTF-8, it's still the same bytes, which means it's still UTF-8. However, the encoding change it is "marked" as it does when Net :: Telnet does (equivalent) data[n]
, it will always receive one byte (instead of potentially receiving multibyte characters like in UTF-8), which is just what we want.
So...
host = Net::Telnet::new(
# ...all of your other options...
'Binmode' => true
)
def send_cmd_container(host, cmd, params=nil)
cmd = JSON.generate('*C' => 'se','Q' => [ get_cmd(cmd, params) ])
cmd.force_encoding(Encoding::BINARY)
host.puts(cmd)
# ...
end
(Note: JSON.generate
always returns a UTF-8 string, so you don't have to do, for example cmd.to_s
.)
Useful diagnostics
A quick way to check what data Net :: Telnet is actually sending (and receiving) is to set a parameter 'Dump_log'
(similar to setting a parameter 'Output_log'
). It will write both sent and received data to a hexdump log file, allowing you to see if the bytes were sent correctly. For example, I started a test server ( nc -l 5555
) and sent a string früh
( host.puts "früh".force_encoding(Encoding::BINARY)
) and this is what was logged:
> 0x00000: 66 72 c3 bc 68 0a fr..h.
You can see that it sent six bytes: the first two are f
and r
, the next two make up ü
, and the last two make h
up a newline. On the right, bytes that are not printable characters are shown as .
, ergo fr..h.
. (Likewise, I sent the string I❤NY
and saw I...NY.
in the right column because ❤
- three bytes in UTF-8 :) e2 9d a4
.
So, if you've installed 'Dump_log'
and uploaded ü
, you should see c3 bc
in the output. If so, congratulations, you are sending UTF-8!
PS Read Yehuda Katz Ruby 1.9 Encodings: A Primer and Solution for Rails . In fact, read it annually. It's really, really helpful.
source to share