My goal is to parse a large XML file and persist objects to DB based on the XML data, and to do it quickly. The operation needs to be transactional so I can rollback in case there is a problem parsing the XML or an object that gets created cannot be validated.
I am using the Grails Executor plugin to thread the operation. The problem is that each thread I create within the service has its own transaction and session. If I create 4 threads and 1 fails the session for the 3 that didn't fail may have already flushed, or they may flush in the future.
I was thinking if I could tell each thread to use the "current" Hibernate session that would probably fix my problem. Another thought I had was that I could prevent all sessions from flushing until it was known all completed without errors. Unfortunately I don't know how to do either of these things.
There is an additional catch too. There are many of these XML files to parse, and many that will be created in the future. Many of these XML files contain data that when parsed would create an object identical to one that was already created when a previous XML file was parsed. In such a case I need to make a reference to the existing object. I have added a transient isUnique
variable to each class to address this. Using the Grails unique constraint does not work because it does not take hasMany
relationships into account as I have outlined in my question here.
The following example is very simple compared to the real thing. The XML file's I'm parsing have deeply nested elements with many attributes.
Imagine the following domain classes:
class Foo {
String ver
Set<Bar> bars
Set<Baz> bazs
static hasMany = [bars: Bar, bazs: Baz]
boolean getIsUnique() {
Util.isUnique(this)
}
static transients = [
'isUnique'
]
static constraints = {
ver(nullable: false)
isUnique(
validator: { val, obj ->
obj.isUnique
}
)
}
}
class Bar {
String name
boolean getIsUnique() {
Util.isUnique(this)
}
static transients = [
'isUnique'
]
static constraints = {
isUnique(
validator: { val, obj ->
obj.isUnique
}
)
}
}
class Baz {
String name
boolean getIsUnique() {
Util.isUnique(this)
}
static transients = [
'isUnique'
]
static constraints = {
isUnique(
validator: { val, obj ->
obj.isUnique
}
)
}
}
And here is my Util.groovy
class located in my src/groovy
folder. This class contains the methods I use to determine if an instance of a domain class is unique and/or retrieve the already existing equal instance:
import org.hibernate.Hibernate
class Util {
/**
* Gets the first instance of the domain class of the object provided that
* is equal to the object provided.
*
* @param obj
* @return the first instance of obj's domain class that is equal to obj
*/
static def getFirstDuplicate(def obj) {
def objClass = Hibernate.getClass(obj)
objClass.getAll().find{it == obj}
}
/**
* Determines if an object is unique in its domain class
*
* @param obj
* @return true if obj is unique, otherwise false
*/
static def isUnique(def obj) {
getFirstDuplicate(obj) == null
}
/**
* Validates all of an object's constraints except those contained in the
* provided blacklist, then saves the object if it is valid.
*
* @param obj
* @return the validated object, saved if valid
*/
static def validateWithBlacklistAndSave(def obj, def blacklist = null) {
def propertiesToValidate = obj.domainClass.constraints.keySet().collectMany{!blacklist?.contains(it)? [it] : []}
if(obj.validate(propertiesToValidate)) {
obj.save(validate: false)
}
obj
}
}
And imagine XML file "A" is similar to this:
<foo ver="1.0">
<!-- Start bar section -->
<bar name="bar_1"/>
<bar name="bar_2"/>
<bar name="bar_3"/>
...
<bar name="bar_5000"/>
<!-- Start baz section -->
<baz name="baz_1"/>
<baz name="baz_2"/>
<baz name="baz_3"/>
...
<baz name="baz_100000"/>
</foo>
And imagine XML file "B" is similar to this (identical to XML file "A" except one new bar
added and one new baz
added). When XML file "B" is parsed after XML file "A" three new objects should be created 1.) A Bar
with name = bar_5001
2.) A Baz
with name = baz_100001
, 3.) A Foo
with ver = 2.0
and a list of bars
and bazs
equal to what is shown, reusing instances of Bar
and Baz
that already exist from the import of XML file A
:
<foo ver="2.0">
<!-- Start bar section -->
<bar name="bar_1"/>
<bar name="bar_2"/>
<bar name="bar_3"/>
...
<bar name="bar_5000"/>
<bar name="bar_5001"/>
<!-- Start baz section -->
<baz name="baz_1"/>
<baz name="baz_2"/>
<baz name="baz_3"/>
...
<baz name="baz_100000"/>
<baz name="baz_100001"/>
</foo>
And a service similar to this:
class BigXmlFileUploadService {
// Pass in a 20MB XML file
def upload(def xml) {
String rslt = null
def xsd = Util.getDefsXsd()
if(Util.validateXmlWithXsd(xml, xsd)) { // Validate the structure of the XML file
def fooXml = new XmlParser().parseText(xml.getText()) // Parse the XML
def bars = callAsync { // Make a thread for creating the Bar objects
def bars = []
for(barXml in fooXml.bar) { // Loop through each bar XML element inside the foo XML element
def bar = new Bar( // Create a new Bar object
name: barXml.attribute("name")
)
bar = retrieveExistingOrSave(bar) // If an instance of Bar that is equal to this one already exists then use it
bars.add(bar) // Add the new Bar object to the list of Bars
}
bars // Return the list of Bars
}
def bazs = callAsync { // Make a thread for creating the Baz objects
def bazs = []
for(bazXml in fooXml.baz) { // Loop through each baz XML element inside the foo XML element
def baz = new Baz( // Create a new Baz object
name: bazXml.attribute("name")
)
baz = retrieveExistingOrSave(baz) // If an instance of Baz that is equal to this one already exists then use it
bazs.add(baz) // Add the new Baz object to the list of Bazs
}
bazs // Return the list of Bazs
}
bars = bars.get() // Wait for thread then call Future.get() to get list of Bars
bazs = bazs.get() // Wait for thread then call Future.get() to get list of Bazs
def foo = new Foo( // Create a new Foo object with the list of Bars and Bazs
ver: fooXml.attribute("ver")
bars: bars
bazs: bazs
).save()
rslt = "Successfully uploaded ${xml.getName()}!"
} else {
rslt = "File failed XSD validation!"
}
rslt
}
private def retrieveExistingOrSave(def obj, def existingObjCache) {
def dup = Util.getFirstDuplicate(obj)
obj = dup ?: Util.validateWithBlacklistAndSave(obj, ["isUnique"])
if(obj.errors.allErrors) {
log.error "${obj} has errors ${obj.errors}"
throw new RuntimeException() // Force transaction to rollback
}
obj
}
}
So the question is how do I get everything that happens inside of my service's upload
method to act as it happened in a single session so EVERYTHING that happens can be rolled back if any one part fails?
Service can be optimized to address some of the pain points:
flush
on each creation of child object. See below for the optimization.Service class would look something like: