How to determine wide chars in Ruby? (Chinese, Japanese, Korean)

1.2k Views Asked by At

Im trying to determine the physical pixel width of a string.

for example:

FONT_SIZE = 10
str="123456789"
width = str.length * FONT_SIZE  # which will be 9 * 10 = 90px

PROBLEM: But for chinese, japanese or korean:

FONT_SIZE = 10
str="一二三四五六七八九"
width = str.length * FONT_SIZE  # this still result in 90 (9*10)

But it really should be 180 as they are 2 chars with for each char.

How do I make this function (returns true/false)?

def is_wide_char char
  #how to?
end

class String
  def wlength
    l = 0
    self.each{|c| is_wide_char(c)? l+=2: l+=1}
    l
  end
end
2

There are 2 best solutions below

0
On

How can I detect CJK characters in a string in Ruby? gives the answer

class String
  def contains_cjk?
    !!(self =~ /\p{Han}|\p{Katakana}|\p{Hiragana}\p{Hangul}/)
  end
end

strings= ['日本', '광고 프로그램', '艾弗森将退出篮坛', 'Watashi ha bakana gaijin desu.']
strings.each{|s| puts s.contains_cjk?}

#true
#true
#true
#false
0
On

Experts from unicode.org have already made a table to distinguish wideness of each character for you. You should refer to UAX #11 and it's data file.

By seeing the data file, you would know it is easy to parse, however, if you prefer to use a gem, there is east_asian_width_simple. There are other gems too but east_asian_width_simple is faster and more flexable.

Usage

require 'east_asian_width_simple'
eaw = EastAsianWidthSimple.new(File.open('EastAsianWidth.txt'))
eaw.string_width('台灣 No.1') # => 9
eaw.string_width('No code, no ') # => 14

Wide character and full-width character are different by definitions in UAX #11 but based on your description, I think the following code would be the closest implementation of what you want to achieve:

require 'east_asian_width_simple'
$eaw = EastAsianWidthSimple.new(File.open('EastAsianWidth.txt'))

def is_wide_char(char)
  case $eaw.lookup(char.ord)
  when :F, :W then true
  else false
  end
end