I am studying Javaee Batch API (jsr-352) in order to test the feasibility of changing out current ETL tool for our own solution using this technology.
My goal is to build a job in which I:
- get some (dummy) data from a datasource in step1,
- some other data from other data-source in step2 and
- merge them in step3.
I would like to process each item and not write to a file, but send it to the next step. And also store the information for further use. I could do that using batchlets and jobContext.setTransientUserData()
.
I think I am not getting the concepts right: as far as I understood, JSR-352 is meant for this kind of ETL tasks, but it has 2 types of steps: chunk and batchlets. Chunks are "3-phase-steps", in which one reads, processes and writes the data. Batchlets are tasks that are not performed on each item on the data, but once (as calculating totals, sending email and others).
My problem is that my solution is not correct if I consider the definition of batchlets.
How could one implement this kinf od job using Javaee Batch API?
I think you better to use chunk rather than batchlet to implement ETLs. typical chunk processing with a datasource is something like following:
ItemReader#open()
: open a cursor (createConnection
,Statement
andResultSet
) and save them as instance variables ofItemReader
.ItemReader#readItem()
: create and return a object that contains data of a row usingResultSet
ItemReader#close()
: close JDBC resourcesItemProcessor#processItem()
: do calculation and create and return a object which contains resultItemWriter#writeItems()
: save calculated data to database. openConnection
,Statement
and invokeexecuteUpdate()
and close them.As to your situation, I think you have to choose one data which considerble as primary one, and open a cursor for it in
ItemReader#open()
. then get another one inItemProcessor#processItem()
for each item.Also I recommend you to read useful examples of chunk processing:
My blog entries about JBatch and chunk processing: