How can I check if a string contains latin characters like é in Ruby?

Given:

str1 = "é"   # Latin accent
str2 = "囧"  # Chinese character
str3 = "ジ"  # Japanese character
str4 = "e"   # English character

      

How to distinguish str1

(Latin accent characters) from other strings?

Update:

Considering

str1 = "\xE9" # Latin accent é actually stored as \xE9 reading from a file

      

How will the answer differ?

+3


source to share


3 answers


I would first highlight all simple ASCII characters with gsub

, and then check with a regex to see if there are any latin characters left. This should detect accented Latin characters.



def latin_accented?(str)
  str.gsub(/\p{Ascii}/, "") =~ /\p{Latin}/
end

latin_accented?("é")  #=> 0 (truthy)
latin_accented?("囧") #=> nil (falsy)
latin_accented?("ジ") #=> nil (falsy)
latin_accented?("e")  #=> nil (falsy)

      

+3


source


Try using /\p{Latin}/.match(strX)

or /\p{Latin}&&[^a-zA-Z]/

(if you only want to detect special Latin characters).

By the way, "e" (str4) is also a Latin character.



Hope it helps.

+1


source


I would use a two step approach:

  • Rule strings containing non-Latin characters by attempting to encode the string as Latin-1 (ISO-8859-1).
  • Test for accented characters with regular expression.

Example:

def is_accented_latin?(test_string)
  test_string.encode("ISO-8859-1")   # just to see if it raises an exception

  test_string.match(/[ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöùúûüýþÿ]/)
rescue Encoding::UndefinedConversionError
  false
end

      

I highly recommend that you choose the accented characters you are trying to screen for yourself, and not just copy what I wrote; I may have missed a few. Also note that this will always return false

for strings containing non-Latin characters, even if the string also contains an accented Latin character.

+1


source







All Articles