How to check if a string contains ASCII code

Given a string A\xC3B

, it can be converted to a utf-8 string by doing this (ref reference ):

"A\xC3B".force_encoding('iso-8859-1').encode('utf-8') #=> "AÃB"

      

However, I only want to perform an action if the string contains ASCII code viz \xC3

. How can I check this?

Tried "A\xC3B".include?("\x")

it but it doesn't work.

+3


source to share


2 answers


\x

is just a hexadecimal escape sequence. It has nothing to do with encodings. US-ASCII goes from "\x00"

to "\x7F"

(for example, "\x41"

matches "A"

, "\x30"

is "0"

). The rest (from "\x80"

to "\xFF"

), however, are not US-ASCII characters, as it is a 7-bit character set.

If you want to check if a string contains only US-ASCII characters, call String#ascii_only?

:

p "A\xC3B".ascii_only? # => false
p "\x41BC".ascii_only? # => true

      



Another example based on your code:

str = "A\xC3B"
unless str.ascii_only?
  str.force_encoding(Encoding::ISO_8859_1).encode!(Encoding::UTF_8)
end
p str.encoding # => #<Encoding:UTF-8>

      

+5


source


I think what you want is to find out if your string is encoded correctly. The solution ascii_only?

doesn't help much when dealing with non-Ascii strings.

I would use String#valid_encoding?

to check if a string is encoded correctly even if it contains non-ASCII characters.



For example, what if someone else encoded the "Françoise Paré"

correct path, and when I decode it, I get the correct string instead "Fran\xE7oise Par\xE9"

(which is what would be decoded if someone encoded it to ISO-8859-1).

[62] pry(main)> "Françoise Paré".encode("utf-8").valid_encoding?
=> true

[63] pry(main)> "Françoise Paré".encode("iso-8859-1")
=> "Fran\xE7oise Par\xE9"

# Note the encoding is still valid, it just the way IRB displays
# ISO-8859-1

[64] pry(main)> "Françoise Paré".encode("iso-8859-1").valid_encoding?
=> true

# Now let interpret our 8859 string as UTF-8. In the following
# line, the string bytes don't change, `force_encoding` just makes
# Ruby interpret those same bytes as UTF-8.

[65] pry(main)> "Françoise Paré".encode("iso-8859-1").force_encoding("utf-8")
=> "Fran\xE7oise Par\xE9"

# Is a lone \xE7 valid UTF-8? Nope.

[66] pry(main)> "Françoise Paré".encode("iso-8859-1").force_encoding("utf-8").valid_encoding?
=> false

      

0


source







All Articles