QRegularExpression find and capture all quoted and non-quoated parts in string

397 Views Asked by At

I am fairly new to using regexes.

I got a string which can contain quoted and not quoted substrings.

Here are examples of how they could look:

"path/to/program.exe" -a -b -c
"path/to/program.exe" -a -b -c
path/to/program.exe "-a" "-b" "-c"
path/to/program.exe "-a" -b -c

My regex looks like this: (("[^"]*")|([^"\t ]+))+

With ("[^"]+") I attempt to find every quoted substring and capture it.

With ([^"\t ]+) I attempt to find every substring without quotes.

My code to test this behaviour looks like this:

QString toMatch = R"del(     "path/to/program.exe" -a -b -c)del";
qDebug() << "String to Match against: " << toMatch << "\n";
QRegularExpression re(R"del((("[^"]+")|([^"\t ]+))+)del");
QRegularExpressionMatchIterator it = re.globalMatch(toMatch);
int i = 0;
while (it.hasNext())
{
   QRegularExpressionMatch match = it.next();
   qDebug() << "iteration: " << i << "  captured: " << match.captured(i) << "\n";
   i++;
}

Output:

String to Match against:  "     \"path/to/program.exe\" -a -b -c"

iteration:  0   captured:  "\"path/to/program.exe\""

iteration:  1   captured:  "-a"

iteration:  2   captured:  ""

iteration:  3   captured:  "-c"

Testing it in Regex101 shows me the result I want. I also tested it on some other websites e.g this.

I guess I am doing something wrong, could anyone point in the right direction?

Thanks in advance.

1

There are 1 best solutions below

1
On BEST ANSWER

You assume that the groups you need to get value from will change their IDs with each new match, while, in fact, all the groups IDs are set in the pattern itself.

I suggest removing all groups and just extract the whole match value:

QString toMatch = R"del(     "path/to/program.exe" -a -b -c)del";
qDebug() << "String to Match against: " << toMatch << "\n";
QRegularExpression re(R"del("[^"]+"|[^"\s]+)del");
QRegularExpressionMatchIterator it = re.globalMatch(toMatch);
while (it.hasNext())
{
   QRegularExpressionMatch match = it.next();
   qDebug() << "  matched: " << match.captured(0) << "\n";
}

Note the "[^"]+"|[^"\s]+ pattern matches either

  • "[^"]+" - ", then one or more chars other than " and then a "
  • | - or
  • [^"\s]+ - one or more chars other than " and whitespace.

See the updated pattern demo.