Spark Thrift server load full dataset into memory before transmission via JDBC

892 Views Asked by Triffids At 01 November 2018 at 08:37

Spark Thrift server trying to load full dataset into memory before transmission via JDBC, on JDBC client I'm receiving error:

SQL Error: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
  org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)
  org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (XX GB) is bigger than spark.driver.maxResultSize (XX GB)

Query: select * from table. Is it possible enable something like stream mode for Thrift Server? The main goal - grant access from Pentaho ETL to Hadoop cluster using SparkSQL via JDBC connection. But if Thrift Server should load full dataset into memory before transmission this approach will not work.

Original Q&A

There are 2 best solutions below

Triffids On 03 November 2018 at 07:59

Solution: spark.sql.thriftServer.incrementalCollect=true

Sanjai Verma On 03 November 2018 at 12:12

I your situation increase the spark driver memory and max result size as spark.driver.memory=xG ,spark.driver.maxResultSize=xG. according to https://spark.apache.org/docs/latest/configuration.html

Spark Thrift server load full dataset into memory before transmission via JDBC

There are 2 best solutions below

Related Questions in APACHE-SPARK

Related Questions in SPARK-THRIFTSERVER

Trending Questions

Popular # Hahtags

Popular Questions