Greenplum download dump to local cluster in parallel

191 Views Asked by VB_ At 26 June 2025 at 23:15

Is there any more effective way to fetch the whole Greenplum's dump than doing it through multiple JDBC connections to master node?

I need to download the whole dump of Greenplum through JDBC. To do the job quicker I am going to use Spark parallelism (fetching data in parallel through multiple JDBC connections). As I understand, I will have multiple JDBC connections to Greenplum's single master node. I am going to store the data at HDFS in parquet format.

Original Q&A

There are 2 best solutions below

Sung Yu-wei On 21 December 2016 at 15:40

For parallel exporting, you can try gphdfs writable external table. Gpdb segments can parallel write/read External sources.

http://gpdb.docs.pivotal.io/4340/admin_guide/load/topics/g-gphdfs.html

Kong Yew Chan On 13 October 2017 at 20:27

Now, you can use Greenplum-Spark connector to parallelize data transfer between Greenplum segments and Spark executors.

This greenplum-spark connector speeds up data transfer as it leverage parallel processing in Greenplum segments and Spark workers. Definitely, it is faster than using JDBC connector that transfer data via Greenplum master node.

Reference: http://greenplum-spark.docs.pivotal.io/100/index.html

Greenplum download dump to local cluster in parallel

There are 2 best solutions below

Related Questions in JDBC

Related Questions in PARALLEL-PROCESSING

Related Questions in DATA-WAREHOUSE

Related Questions in GREENPLUM

Trending Questions

Popular # Hahtags

Popular Questions