How to work with characters like "" in Ruby

I would like to convert "HELLO" to "HELLO", removing all special characters that cause problems when inserting into the database. They don't seem to be part of UTF8.

I'm trying to figure out Iconv , but I'm kind of stuck here:

str = "A string with " to "A string with "
some_format = "I have no clue what format this is"
Iconv.conv(some_format, 'UTF-8//IGNORE', str)

      

Doing this action:

Iconv.conv('UTF-16', 'UTF-8//IGNORE', str)

      

... returns ...

\376\377\000H\000E\000L\000L\000O?G?`?`?`?`?`?`?`?`?`?`?`?`?`?`?`?`?`?`?`?`?`?`?`?`?`?`?????\342

      

I don't want to convert to something other than UTF8 because I have to deal with Arabic characters, Chinese, Japanese, Korean ...

Any help / pointers would be appreciated. I am using Ruby 1.8.7, but I need to upgrade to 1.9.3 very soon. A solution that works in both situations is best, but only for 1.9.3 is also good.

+3


source to share


1 answer


Here is a way to remove characters that are not in a particular encoding (when converting a string to a different encoding)

# -*- coding: utf-8 -*-
a = "⚒og"
p a => ⚒og
p a.encode('iso-8859-1', :undef => :replace, :replace => '') => og

      



However, your problem may be different. Because it is very unlikely that these problem characters are not part of utf-8. Possible problems:

  • Perhaps it's just that the font you are using doesn't know how to display those characters. Very few fonts have full utf-8 character coverage. I don't know how you are trying to display these lines, but make sure to use a font with good character coverage. For example, for example DejaVu, http://dejavu-fonts.org/wiki/Main_Page

  • Are you sure your database is configured correctly to use utf-8?

  • Also be careful, because your string might be fine but not show up in your terminal or database application due to incomplete utf-8 support (with me before). So sometimes it can be tricky to debug when your debug screen is listening ... (does this make sense?)

+4


source







All Articles