I have some MS Word documents which I have transferred the entire contents into a SQL table.
The contents contain a number of square brackets and curly brackets e.g.
[{a} as at [b],] {c,} {d,} etc
and I need to do a check to make sure that the brackets are balanced/matching, e.g. the below contents should return false:
- [{a} as at [b], {c,} {d,}
- ][{a} as at [b], {c,} {d,}
- [{a} as at [b],] {c,} }{d,
What I've done so far is extracted all the brackets and stored their info into a SQL table like below: (paragraph number, bracket type, bracket position, bracket level)
3 [ 8 1
3 ] 18 0
3 [ 23 1
3 ] 35 0
7 [ 97 1
7 ] 109 0
7 [ 128 1
7 { 129 2
7 } 165 1
7 [ 173 2
7 ] 187 1
7 ] 189 0
7 { 192 1
7 } 214 0
7 { 216 1
7 } 255 0
7 { 257 1
7 } 285 0
7 { 291 1
7 } 326 0
7 { 489 1
7 } 654 0
I am unsure how the algorithm will work to do the check on whether the brackets are balanced in each paragraph, and give an error message when they are not.
Any advice would be appreciated!
EDIT:
Code will need to work for the following scenario too;
(paragraph number, bracket type, bracket position, bracket level)
15 [ 543 1
15 { 544 2
15 } 556 1
15 [ 560 2
15 ] 580 1
15 ] 581 0
15 [ 610 1
15 ] 624 0
15 [ 817 1
15 ] 829 0
I'm not sure which tool you have available, but here is a tested JavaScript function which validates that all (possibly nested) square brackets and curly braces are properly matched:
It works by matching and removing innermost balanced pairs in an iterative manner until there are no more matching pairs left. Once this is complete, a test is made to see if any square bracket or curly braces remain. If any remain, then the function returns false, otherwise it returns true. You should be able to implement this function in just about any language.
Note that this assumes that the square and curly brace pairs are not interleaved like so:
[..{..]..}
Hope this helps.
Addendum: Extended version for: (), {}, [] and <>
The above method can be easily extended to handle testing all four matching bracket types: (), {}, [] and <>, like so:
Note the regex has been documented in an extended mode C comment.
Edit 20150530: Extended to handle a mix of all four matching bracket types: (), {}, [] and <>.