matching between breaks in UIMA RUTA based on a condition

76 Views Asked by At

I have the following sample text:

zip 20193
New York
USA

What I would like to do, is match only "New York" i.e., the line after the zipcode.

I tried using this code but it is not working -

DECLARE heading; pin BREAK #{-> MARK(heading)} BREAK;

(I have declared pin before this).

Please let me know how to go about this.

Thanks!

1

There are 1 best solutions below

0
On

The problem is probably the filtering setting. BREAK is by default not visible. It will never be a successful match because ruta will automatically skip the line breaks.

Try to add another rule changing the filtering setting in front of your rule:

RETAINTYPE(BREAK);
pin BREAK #{-> MARK(heading)} BREAK;

There could be another problem because BREAK represents \n and \r. Thus, the rule would not work for windows line endings. You would need something like:

pin BREAK[1,2] #{-> MARK(heading)} BREAK;

There is a utils analysis engine in ruta for annotating lines: PlainTextAnnotator If you include it, you can write something like:

pin Line{-> heading};

(You maybe need to trim the Lines, e.g., with the TRIM action if the lines start or end with whitespaces)

DISCLAIMER: I am a developer of UIMA Ruta