Any way to use a regular expression with two different pairs of delimiters?

605 Views Asked by At

I've added emojis to my Android application and I've been using Regex, in Java, so the codes assigned to them will match the regular expression (which contains a pair of delimiters to be used with), making the characters show up as images.

Some emoji codes are, for example, sad, happy, smile.

So far, it's been like this:

  • Delimiters: ( and )

  • Regular expression: \\(([.[^\\(\\)]]+)\\)

  • Example of emoji codes matched: (sad), (happy), (smile).

I've noticed, tho, that for some new emojis that I added, it would be more practical for the user to type their codes using another pair of delimiters, like the letter z and ,. Then, the second case would be like this:

  • Delimiters: z and ,

  • Regular expression: z([.[^z\\,]]+)\\,

  • Example of emoji codes matched: zsad,, zhappy,, zsmile,.

What I want, then, is to merge both of these two regular expressions, so the user can type the emoji code using either of the two pair of delimiters, whichever he or she prefers, and it will be matched. For example, the sad emoji would be matched and it would show up as an image everytime it's written as either (sad) orzsad,, like in:

Hi. (sad) I've got bad news. zsad,

Hey... (sad)

Okay. Bye. zsad,

I've tried using alternation operator and lookarounds with no success. In the following two regular expressions, I only had matches to what is left of the | alternator (and I want matches for both left and right sides, of course):

\\(([.[^\\(\\)]]+)\\)|z([.[^z\\,]]+)\\,

z([.[^z\\,]]+)\\,|\\(([.[^\\(\\)]]+)\\)

And in the following regular expressions, I had no matches at all:

(\\(([.[^\\(\\)]]+)\\)|z([.[^z\\,]]+)\\,), (\\(([.[^\\(\\)]]+)\\))|(z([.[^z\\,]]+)\\,)

(z([.[^z\\,]]+)\\,|\\(([.[^\\(\\)]]+)\\)), (z([.[^z\\,]]+)\\,)|(\\(([.[^\\(\\)]]+)\\))

\\(|z([.[^\\(\\z\\,)]]+)\\)|\\,, (\\(|z)([.[^\\(\\z\\,)]]+)(\\)|\\,) (\\()|(z)([.[^\\(\\z\\,)]]+)(\\))|(\\,)

(?=\\(([.[^\\(\\)]]+)\\))(?=z([.[^z\\,]]+)\\,), (?=.*\\(([.[^\\(\\)]]+)\\))(?=.*z([.[^z\\,]]+)\\,)

Sorry for the gigantic text, I only wanted to give as much details as possible. Does anyone know what I am doing or writing wrong, and what regular expression I can use so it matches both zemojicode, and (emojicode)? Your help will be very much appreciated.

3

There are 3 best solutions below

0
On

Java does not let you use duplicate names for capture groups, nor does it have a branch reset support, nor conditional expressions. You need to use alternation and then act depending on how you need to process the matches.

So, use this regex:

\(([.[^()]]+)\)|z([.[^z,]]+),

Do not forget to double the backslashes in Java code.

Check this demo that only handles the match values:

String s = "Hi. (sad) I've got bad news. zsad,\nHey... (sad)\nOkay. Bye. zsad,";
System.out.println(s.replaceAll("\\(([.[^()]]+)\\)|z([.[^z,]]+),", "<<$0>>")); 

Output:

Hi. <<(sad)>> I've got bad news. <<zsad,>>
Hey... <<(sad)>>
Okay. Bye. <<zsad,>>
0
On

You could use something like this:

(z[a-zA-Z]*,|\([a-zA-Z]*\))

Here's the example

It will capture z<anylettershere>, or (<anylettershere>)

To match more than 1 in a message, use global, which will probably be needed, and it is included in the example link. It matches the provided sentences by you on 3 separate Java regex testers that I have found.

Edit

Just a note, any of the \ characters may need to be doubled. I primarily use PHP, rather than Java, so I am not as knowledgable about that, but the example given would then become:

(z[a-zA-Z]*,|\\([a-zA-Z]*\\))
1
On

I'd probably go with

\((\w+)\)|z(\w+),

which I find simpler, and, as your own attempts, just capture the actual token. The \w allows for digits and underscore in the token as well, which I don't know if you consider a plus, but should hardly be a drawback(?).

So as a java string:

 \\((\\w+)\\)|z(\\w+),

Check it out here, at regex101.

As an alternative, I'd like to mention this one:

[(z](\w+)[),]

It's even simpler, but doesn't have the built in syntax check. In other words it would allow a combination of the delimiters, e.g. (sad, and zhappy), which may be considered a drawback.

Regards