Ruby: how to check if an UTF-8 string contains only letters and numbers?

3.5k Views Asked by krn At 31 January 2011 at 22:20

I have an UTF-8 string, which might be in any language.

How do I check, if it does not contain any non-alphanumeric characters?

I could not find such method in UnicodeUtils Ruby gem.

Examples:

ėččę91 - valid
$120D - invalid

Original Q&A

There are 3 best solutions below

the Tin Man On 31 January 2011 at 23:46 BEST ANSWER

You can use the POSIX notation for alpha-numerics:

#!/usr/bin/env ruby -w
# encoding: UTF-8

puts RUBY_VERSION

valid = "ėččę91"
invalid = "$120D"

puts valid[/[[:alnum:]]+/]
puts invalid[/[^[:alnum:]]+/]

Which outputs:

1.9.2
ėččę91
$

Michael Papile On 31 January 2011 at 23:47

In ruby regex \p{L} means any letter (in any glyph)

so if s represents your string:

 s.match /^[\p{L}\p{N}]+$/

This will filter out non numbers and letters.

tchrist On 01 February 2011 at 00:19

The pattern for one alphanumeric code point is

/[\p{Alphabetic}\p{Number}]/

From there it’s easy to extrapolate something like this for has a negative:

/[^\p{Alphabetic}\p{Number}]/

or this for is all positive:

 /^[\p{Alphabetic}\p{Number}]+$/

or sometimes this, depending:

/\A[\p{Alphabetic}\p{Number}]+\z/

Pick the one that best suits your needs.

Ruby: how to check if an UTF-8 string contains only letters and numbers?

There are 3 best solutions below

Related Questions in RUBY

Related Questions in UNICODE

Related Questions in UTF-8

Related Questions in CHARACTER-PROPERTIES

Trending Questions

Popular # Hahtags

Popular Questions