Transform ABNF rules to REGEX

771 Views Asked by At

I need to transform undermentioned ABNF rules (mlaer) to REGEX

   mlaer       =  1*( lebal "." ) lebal
   lebal       =  gid-tel *(rts-hdl)

   rts-hdl    =  *( alpha / digit / "-" ) gid-tel
   gid-tel    =  alpha / digit
   alpha       =  %x41-5A  ; 'A'-'Z'
   alpha       =/ %x61-7A  ; 'a'-'z'
   digit       =  %x30-39  ; '0'-'9'

Is any tool or sth to do it automatically?

3

There are 3 best solutions below

0
On

Not sure if there is any tool to do this automatically, but it is not too hard.

gid-tel

[A-Za-z0-9]

rts-hdl

[A-Za-z0-9-]*[A-Za-z0-9]

lebal

[A-Za-z0-9]([A-Za-z0-9-]*[A-Za-z0-9])*

Note that lebal written in this form is going to cause NFA engine to run very long on certain type of input. It should be re-written as:

[A-Za-z0-9]([A-Za-z0-9-]*[A-Za-z0-9])?

mlaer

([A-Za-z0-9]([A-Za-z0-9-]*[A-Za-z0-9])?\.)+[A-Za-z0-9]([A-Za-z0-9-]*[A-Za-z0-9])?

You can construct a complicated regex by using string concatenation. This will allow you to write clean code. Though the case with lebal needs modification on the grammar so that it works well on an NFA engine.

0
On

One should be aware that in the general sense it is not possible to translate ABNF to REGEX.

This is because regexes create a regular language, while the ABNF specifications create a context-free language.

A regular language can be parsed by a finite state machine (which is also used for regex-matching), while a context free language is parsed by a pushdown automata which is superset of the finite state machine (A pushdown automata can be implemented with the bison/yacc tool).

Side note: a regex string itself is not validatable by a regex expression. This is because there are brackets/parentheses allowed, while bracket/parenthese matching is not enforable with a regex, but with a context free grammar.

Translation from ABNF to regex is hence only possible for a subset of cases. I guess this is the case if the ABNF is not recursive or does not contain any cyclic definitions. This is the (implicit) limitation of the above mentioned automatic translation tools.

0
On

For smaller ABNFs this online tool written in PHP worked for me. In your case it returns:

gid-tel: ^([A-Z][a-z0-9])$
rts-hdl: ^(([A-Z][-a-z0-9])*([A-Z][a-z0-9]))$
lebal: ^([A-Z][a-z0-9])((([A-Z][-a-z0-9])*([A-Z][a-z0-9])))*$
mlaer: ^(([A-Z][a-z0-9])((([A-Z][-a-z0-9])*([A-Z][a-z0-9])))*\.)+([A-Z][a-z0-9])((([A-Z][-a-z0-9])*([A-Z][a-z0-9])))*$

But for larger ABNF like the one for an E-Mail address, it only outputs blank. So I'm currently looking for other tools and found a weird, small script in Perl and one written in Ruby almost 17 years ago and last committed to 7 years ago. The latter looks promising as it actually gives a RegEx for the URI ABNF but I still need to get it to work.