I have an annotated data set in XML format: see example below
Treatment of <annotation cui="C0267055">Erosive Esophagitis</annotation> in patients
where the tagged words are in XML tags as shown. I need to get it into BRAT format, such as:
T1 annotation 14 33 Erosive Esophagitis
More examples can be found in http://brat.nlplab.org/standoff.html
I can extract the annotations using regular expressions in Python, but I am unsure of how to get it into the proper BRAT format. Is there a tool for this possibly?
In case someone still needs an answer to this question, here is a solution.
Let's say an XML file
sample.xml
has the following structure:Here is a Python solution:
Content of
sample.txt
Content of
sample.ann
:And visually in BRAT:
A minor tweak would be needed in case of attributes (I added another key 'att' in
replacetags
dictionary, i.e. a pair would be then"fname": {"tag": "PERS", "att": "value of attribute"}
and then an additional line would be written in case of tag that has attribute.Hope someone will find this helpful!