Ibis vs. Spark for big data processing against an analytics datawarehouse with a DataFrame API?

20 Views Asked by Vito At 28 March 2024 at 21:34

Imagine the following scenario:

I have very large datasets hosted in an analytics datawarehouse
- The warehouse is very efficient at handling large analytic workloads and can scale arbitrarily
I need to process the data in a CPU-intensive way that requires loading much of the data into memory at once
I would like to use a DataFrame API (pandas-like or spark-like)

What should I consider when choosing between Ibis and Spark for such a task?

It seems like the core difference is that with Ibis the compute is happening in the datawarehouse, whereas with Spark it is happening on an external cluster.

Spark seems to be the more popular choice. However, Ibis sounds like it would be cheaper/more convenient: I can use compute I am already paying for (the datewarehouse itself) and avoid having to manage a Spark cluster. If this is true, I don't see why Ibis wouldn't be a more popular choice over Spark.

Original Q&A

Ibis vs. Spark for big data processing against an analytics datawarehouse with a DataFrame API?

There are 0 best solutions below

Related Questions in APACHE-SPARK

Related Questions in ETL

Related Questions in IBIS

Trending Questions

Popular # Hahtags

Popular Questions