What's the usecase?
Our usecase is to take 100Million records from Postgres Materialized view and write it in 10K chunks into Redis Cache.
What we tried:
We used HibernatePagingItemReader to read the records from Postgres Materialized view in synchronous manner. We didn't use any taskexecutor in the Batch Step. So the reader is reading each page of size 20K and provides them to the ItemStreamWriter in 10K chunks. We observed exactly after 795000th record the paging OFFSET breaks and the reader keeps reading couple of records which it already read initially. So i our test run for 1Million records the batch wrote only around 810K unique records,the remaining 200K is not processed by the reader due to this duplicate reading issue.
Workaround we did
We used JdbcPagingItemReader and PostgresPagingQueryProvider to read the records from Postgres view. The additional overhead in this approach is we need a sortKey for the reader so we used the unique column for this reader and it is able to process all records from Postgres view.
What we think we are losing here
With Hibernate and Jpa Readers there would be obviously couple of advantages which we feel might be missing here.
Question to Spring Batch team
is this a known bug in Hibernate and Jpa based Paging Item Readers. We tried all options like setting fetch size to 0 and setting it to equal to pagesize. But no luck with any of those options.