I'm trying to use a regex to capture tweets containing the substring at least twice, so I'm using an unsophisticated
^.+ .+ .+$
. However this doesn't match strings which instead contain, for example, .
Is there a smart way I can capture an emoji with any or none skin-tone variation, without just putting each one in a row (like []
)?
Thanks to comments above, I've found that emojis I've encountered on twitter are unicode, and skin-tone variations are combining characters in the range
1f3fb
–1f3ff
.http://unicode.org/reports/tr51/#Emoji_Modifiers_Table
So for me what I wanted was
[\x{1f3fb}-\x{1f3ff}]?
, with[\x{1f3fb}-\x{1f3ff}]?
being something I can then drop next to any unmodified emoji to include skin-tone variations.