Validating an emoji sequence with ICU4C

244 Views Asked by At

Is there any way I can use ICU4C to check if a string is definitely an emoji? Specifically according to these specifications:

http://www.unicode.org/reports/tr51/tr51-15.html

I can use the regexes or ebnf patterns mentioned there to scan for possible emojis, but validating them according to the rules remains a problem.

From these EBNF rules a regex can be generated, as below. While this regex may seem complex, it is far simpler than what would result from the definitions. Direct use of the definitions would result in regex expressions which are many times more complicated, and yet still require verification with validity tests.

  \p{RI} \p{RI}
| \p{Emoji} 
  ( \p{EMod} 
    | \x{FE0F} \x{20E3}? 
    | [\x{E0020}-\x{E007E}]+ \x{E007F} )?
  (\x{200D} \p{Emoji} 
    ( \p{EMod}
      | \x{FE0F} \x{20E3}? )?)+
0

There are 0 best solutions below