Determine width in terminal of Asian/Japanese characters?

213 Views Asked by At

In my terminal these are equally wide:

ヌー平行
parallel
æøåüäöûß

same width of "ヌー平行" and "parallel" same width of "ヌ" and "p"

I have managed to get Perl to give the length 8 for the last 2 lines, but it reports the length of the first line as 4. Is there a way for me to determine that the width of ヌ is twice that of ø?

1

There are 1 best solutions below

2
ikegami On BEST ANSWER

You can use Text::CharWidth's mbswidth. It uses POSIX's wcwidth.

use v5.14;
use warnings;

use utf8;
use open ':std', ':encoding(UTF-8)';

use Encode             qw( encode_utf8 );
use Text::CharWidth    qw( mbswidth );
use Unicode::Normalize qw( NFC NFD );

my @tests = (
   [ "ASCII",     "parallel",      8 ],
   [ "NFC",       NFC("æøåüäöûß"), 8 ],
   [ "NFD",       NFD("æøåüäöûß"), 8 ],
   [ "EastAsian", "ヌー平行",      8 ],
);

for ( @tests ) {
   my ( $name, $s, $expect ) = @$_;
   my $length = length( $s );
   my $got = mbswidth( encode_utf8( $s ) );
   printf "%-9s length=%2d expect=%d got=%d\n", 
      $name, $length, $expect, $got;
}
ASCII     length= 8 expect=8 got=8
NFC       length= 8 expect=8 got=8
NFD       length=13 expect=8 got=8
EastAsian length= 4 expect=8 got=8

Note that mbswidth expects a string encoded using the locale's encoding, which I assumed was UTF-8 in two places in the above program.


If you want to know the number of column a string should take according to Unicode, this is covered by Unicode Standard Annex #11. Note that the answer may depend on whether one is in an East Asian context or not. For example, U+03A6 GREEK CAPITAL LETTER PHI ("Φ") takes up two columns in an East Asian Context, while it takes up only one otherwise.