I'm creating a relational database with SQL syntax which will have a command line interface using Java. To parse the user commands, I'm using regular expressions (I know it's a terrible idea - It's more to learn than anything else)
I'm using the scanner class with the semicolon as the delimiter, so I don't look for semicolons in my regex. This way it can handle multiline input. The input is trimmed before being matched against the regular expression, so whitespace at the beginning and end of the string are not part of the expression.

I've got this regex here which works the way I want it to, until I add the optional flag for the where clause-

select\s+(.*?)(?:\s+where(.*))

That would match (groups are bolded) -

select a from * where x = 3

But if I change the regular expression to this -

select\s+(.*?)(?:\s+where(.*))?

It only matches the quoted part -

"select "a from * where x = 3

My goal is to have it match everything until the end of the string after the word 'select' (including whitespace characters) unless there is a string 'where' preceded and followed by any whitespace characters. If that is present, then group every character between the word 'select' and 'where' and also group everything after the word 'where'.

For example: If this text was input:

select a from * where b = 3

It should group " a from * " and " b = 3 ".

But if this was the input:

select a where x = 3

The table name "a" should be a group and the where clause "x = 3" should be a group.

It is important to note I'm using java.util.regex - It does not have if/else clauses as in Perl regex, but lookaheads with or statements inside a group could be used for the same effect. I could use another library which has support for if/then/else statements, but I can't figure out I could use that to achieve the result I'm looking for.

1

There are 1 best solutions below

1
On

Parsing SQL with regex is not much different from parsing HTML with regex. In other words, it won't work. It is hopeless task, stop right now.

Instead, use some SQL parser. For example, SQL::Statement::Structure for Perl or ANTLR for Java.

Also, since you are creating your own database, it is worth taking a look at how other SQL implementations do it. I would recommend reading source code for PostgreSQL or MySQL and see how they implement advanced SQL parsing.