What is the Python way to express a GREL line that is creating as many tags as needed in an XML document?

191 Views Asked by At

I'm using Open Refine to do something that I KNOW Python can do. I'm using it to convert a csv into an XML metadata document. I can figure out most of it, but the one thing that trips me up, is this GREL line:

{{forEach(cells["subjectTopicsLocal"].value.split('; '), v, '<subject authority="local"><topic>'+v.escape("xml")+'</topic></subject>')}}

What this does, is beautiful for me. I've got a "subject" field in my Excel spreadsheet. My volunteers enter keywords, separated with a "; ". I don't know how many keywords they'll come up with, and sometimes there is only one. That GREL line creates a new <subject authority="local"><topic></topic></subject> for each term created, and of course slides it into the field.

I know there has to be a Python expression that can do this. Could someone recommend best practice for this? I'd appreciate it!

2

There are 2 best solutions below

0
On BEST ANSWER

Basically you want to use 'split' in Python to convert the string from your subject field into a Python list, and then you can iterate over the list.

So assuming you've read the content of the 'subject' field from a line in your csv/excel document already and assigned it to a string variable 'subj' you could do something like:

subjList = subj.split(";")
for subject in subjList:
  #do what you need to do to output 'subject' in an xml element here
0
On

This Python expression is the equivalent to your GREL expression:

['<subject authority="local"><topic>'+escape(v)+'</topic></subject>') for v in split(value,'; ')]

It will create an array of XML snippets containing your subjects. It assumes that you've created or imported an appropriate escape function, such as

from xml.sax.saxutils import escape