I have a Java 21 app where I want to determine if a string has an emoji. I am using the newly created Emoji API from Java 21 but every time I have an input String containing a number like "123" Character::isEmoji() returns true. I have been using this as a resource: https://inside.java/2023/11/20/sip089/
This is the code I have been using:
private boolean containsEmoji(String s) {
return s.codePoints().anyMatch(Character::isEmoji);
}
For example:
System.out.println(
"123".codePoints().anyMatch( Character :: isEmoji )
);
true
And also:
private boolean containsEmoji(String s) {
for(int i = 0; i < s.length(); i++) {
int codePoint = s.codePointAt(i);
if (Character.isEmoji(codePoint)) {
return true;
}
}
return false;
}
Those digits are emoji, technically
Yes, digits 0-9 in the Basic Latin (US-ASCII) block of Unicode are considered to be Emoji, for reasons that escape me.
Follow the trail of documentation:
Character.isEmoji… lists:
Section 1.5.2 Versioning of the Unicode page explains comment
E0.0as:… which confounds me.
But it seems to me that
Character.isEmojireporting plain digits as being emoji is a feature, not a bug.Use
Charater.isEmojiPresentationTo determine if a character is what we more commonly think of as an emoji, use another method on
Characterclass:Charater.isEmojiPresentation. That method returnsfalsefor the code points of the Basic Latin digits.