Spark on Rapids single node

213 Views Asked by etiel At 10 April 2022 at 17:23

I'm trying to run Tpcds on Rapids single node on EMR using this guide: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-rapids.html But getting results that worst than CPU. That make me think that maybe I'm not doing it right or maybe rapids is not well working on single node.

I try to measure also on databricks using this guide: https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-databricks.html And its stuck whithout executing the query.

Is it possible that on single node rapids has low performance? If so, what is the size of recommended claster?

Note: The instance type of cpu is “r5d.xlarge” (16 vcpu, 128G mem, net 10Gbps). GPU instance is “g3.4xlarge” (16vcpu, 122G mem, net 10Gbps). the times were 670 sec on rapids Vs 60 sec on x86. I used spark version 3.1.0 (EMR 6.4.0)

Original Q&A

There are 1 best solutions below

TaureanDyerNV On 02 May 2022 at 22:48

In AWS, you can use G4dn, p3, p4, and g5 clusters. https://rapids.ai/cloud#aws. Your selected cluster's GPU is incompatible with RAPIDS, so it may be falling back to CPU.

Spark on Rapids single node

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in DATABRICKS

Related Questions in AMAZON-EMR

Related Questions in RAPIDS

Trending Questions

Popular # Hahtags

Popular Questions