I have a regular expression to get the initials of a name like below:
/\b\p{L}\./gu
it works fine with English and other languages until there are graphemes and combined charecters occur.
Like
क
in Hindi and
ಕ
in Kannada
are being matched
But,
के
this one in Hindi,
ಕೆ
this one in Kannada are notmatched with this regex.
I am trying to get the initials from a name like J.P.Morgan, etc.
Any help would be greatly appreciated.
You need to match diacritic marks after base letters using
\p{M}*
:The pattern matches
\b
- a word boundary(?<!\p{M})
- the char before the current position must not be a diacritic char (without it, a match can occur within a single word)\p{L}
- any base Unicode letter\p{M}*
- 0+ diacritic marks\.
- a dot.See the PHP demo online: