How to speed up Spark read of Veeva CRM

40 Views Asked by DIggi At 28 July 2025 at 15:12

I am reading data from Veeva CRM using Spark in Databricks. I am using spark.read.format("springml....") Though I am not entirely sure, but does this read happens over a single thread as is the case with JDBC read, or is it otherwise? Is there any way to speed up the read process?

I tried with numpartition on a partition key, but I don't know if Veeva CRM stores any column as indexed. This didn't speed up the read.

Original Q&A

There are 1 best solutions below

Matt Andruff On 05 April 2023 at 17:33

There is always a tradeoff when you speed things up. It's likely the case that it's safer to single thread things so your Veeva CRM doesn't get hammered with connections/data requests. You could use the same trick that is used to speed up something similar to JDBC connections. You could divide up your required data into mapPartitions and then use manual JDBC calls(you can't use spark context inside mapPartitions) from inside the mapPartition passed function to pull data.

You need to be careful what you choose for partition strategy, as you could DDOS your veeva CRM. Experiment with this but side on caution if it's an operational system.

How to speed up Spark read of Veeva CRM

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in SALESFORCE

Related Questions in BIGDATA

Related Questions in DATABRICKS

Related Questions in VEEVA

Trending Questions

Popular # Hahtags

Popular Questions