I'm trying to run a few JDBC extracts in parallel, but this fails on: java.lang.OutOfMemoryError: Java heap space
.
How does Data Connection memory usage work, and how do I resolve this problem?
I'm trying to run a few JDBC extracts in parallel, but this fails on: java.lang.OutOfMemoryError: Java heap space
.
How does Data Connection memory usage work, and how do I resolve this problem?
The Data Connection Agent's memory usage here actually depends mostly on the value of the fetchSize parameter. Per the Oracle JDBC driver docs,
fetchSize
:So, the agent's memory usage should roughly be:
Unfortunately, the default value of fetchSize varies vastly among different JDBC drivers. For instance, certain versions of the Hive JDBC driver have it set to 50, while other, newer versions have a default value of 1000. Oracle JDBC drivers have a default of 10. Postgres by default will try to get the entire ResultSet at once.
Hence, Data Connection allows you to configure the
fetchSize
value. This is configurable both per-source and per-extract.OOM errors aside, tuning
fetchSize
can improve performance significantly in general. There isn't a one-size-fits-all solution, though, and you'll have to experiment to figure out the best parameter value for your extracts. It usually lies somewhere in the 500–5000 range.