I have a problem when I use different regex to highlight words and comments in document (RichEditControl) like SQL.
This is my first regex:
(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|(--.*)
This works good in: /*blahblah*/
and --blahblah
And I have another regex:
((""(.|/[[:blank:]]/)*?"")|('(.|/[[:blank:]]/)*?'))
This works good in: 'blahblah'
(like sql string)
But, if I do this:
'/*blahblah*/'
Before I write the last '
the program show me a exception:
An unhandled exception of type 'System.ArgumentException' occurred in DevExpress.Office.v15.2.Core.dll
Thanks in advance for the help.
This is the full code:
private List<SyntaxHighlightToken> ParseTokens()
{
List<SyntaxHighlightToken> tokens = new List<SyntaxHighlightToken>();
DocumentRange[] ranges = null;
#region SearchSimpleCommas
Regex quotations = new Regex(@"((""(.|/[[:blank:]]/)*?"")|('(.|/[[:blank:]]/)*?'))");
ranges = document.FindAll(quotations);
foreach (var range in ranges)
{
if (!IsRangeInTokens(range, tokens))
tokens.Add(new SyntaxHighlightToken(range.Start.ToInt(), range.Length, StringSettings));
}
#endregion
#region SearchComment--/**/
Regex comment = new Regex(@"(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|(--.*)", RegexOptions.IgnoreCase | RegexOptions.Multiline);
ranges = document.FindAll(comment);
for (int i = 0; i < ranges.Length; i++)
{
tokens.Add(new SyntaxHighlightToken(ranges[i].Start.ToInt(), ranges[i].Length, CommentsSettings));
}
#endregion
tokens.Sort(new SyntaxHighlightTokenComparer());
// fill in gaps in document coverage
AddPlainTextTokens(tokens);
return tokens;
}
private void AddPlainTextTokens(List<SyntaxHighlightToken> tokens)
{
int count = tokens.Count;
if (count == 0)
{
tokens.Add(new SyntaxHighlightToken(0, document.Range.End.ToInt(), defaultSettings));
return;
}
tokens.Insert(0, new SyntaxHighlightToken(0, tokens[0].Start, defaultSettings));
for (int i = 1; i < count; i++)
{
tokens.Insert(i * 2, new SyntaxHighlightToken(tokens[i * 2 - 1].End, tokens[i * 2].Start - tokens[i * 2 - 1].End, defaultSettings));
}
tokens.Add(new SyntaxHighlightToken(tokens[count * 2 - 1].End, document.Range.End.ToInt() - tokens[count * 2 - 1].End, defaultSettings));
}
private bool IsRangeInTokens(DocumentRange range, List<SyntaxHighlightToken> tokens)
{
return tokens.Any(t => IsIntersect(range, t));
}
bool IsIntersect(DocumentRange range, SyntaxHighlightToken token)
{
int start = range.Start.ToInt();
if (start >= token.Start && start < token.End)
return true;
int end = range.End.ToInt() - 1;
if (end >= token.Start && end < token.End)
return true;
return false;
}
#region ISyntaxHighlightServiceMembers
public void ForceExecute()
{
Execute();
}
public void Execute()
{//The Exepction show in this part
document.ApplySyntaxHighlight(ParseTokens());
}
#endregion
EDIT: Thanks Harrison Mc.
I share the code I used in case anyone needs it, only what I modified (inside method ParseTokens):
#region SearchComments&Strings
Regex definitiveRegex = new Regex(@"(?<string>'[^\\']*(?>\\.[^\\']*)*')|(?<comment>(?>/\*(?>[^*]|[\r\n]|(?>\*+(?>[^*/]|[\r\n])))*\*+/)|(?>--.*))");
MatchCollection matches = definitiveRegex.Matches(document.Text);
foreach (System.Text.RegularExpressions.Match match in matches)
{
try
{
System.Text.RegularExpressions.GroupCollection groups = match.Groups;
if (groups["string"].Value.Length > 0)
{
ranges = null;
for (int s = 0; s < groups.Count; s++)
{
if (groups[s].Value != string.Empty)
{
ranges = document.FindAll(groups[s].Value, SearchOptions.None);
for (int z = 0; z < ranges.Length; z++)
{
if(!IsRangeInTokens(ranges[z], tokens))
tokens.Add(new SyntaxHighlightToken(ranges[z].Start.ToInt(), ranges[z].Length, StringSettings));
}
}
}
}
else if (groups["comment"].Value.Length > 0)
{
ranges = null;
for (int c = 0; c < groups.Count; c++)
{
if (groups[c].Value != string.Empty)
{
ranges = document.FindAll(groups[c].Value.Trim(), SearchOptions.None);
for (int k = 0; k < ranges.Length; k++)
{
if (!IsRangeInTokens(ranges[k], tokens))
tokens.Add(new SyntaxHighlightToken(ranges[k].Start.ToInt(), ranges[k].Length, CommentsSettings));
}
}
}
}
}
catch(Exception ex){ }
}
#endregion
In order to avoid highlighting comments in strings and strings in comments, you need some sort of "state", which regular expressions can't easily give you. These situations would be difficult for individual string and comment regular expressions to deal with, because it would require keeping track of whether or not you're in a comment when looking for a string and vice versa.
However, if you use one regular expression that has different groups for match strings versus comments, the greedy consuming of characters would prevent a "comment" in a string or a "string" in a comment from messing things up.
I tested this regular expression, and it seemed to work for both "comments" in strings and "strings" in comments (both with multiple lines).
The key here is that the regular expression is keeping track of the "state" that determines if we're in the middle of a string or in the middle of a comment.
To use this, you'll need to grab the individual groups out of the overall match. The
(?<name>group)
syntax creates a named group, which you can extract later. If the<string>
group has a match then it's a string, and if the<comment>
group has a match then it's a comment. Since I'm not familiar with thedocument.FindAll
method, I adopted an example from the .NET documentation using theregex.Matches
method:Hopefully this helps!
P.S. I used regex101.com to test the regex, but to do so I had to escape the forward slashes and not escape the double quotes. I tried my best to add them back in, but I may have missed one or two.
References: