Is there a way to monitor the CPU utilization of Apache Spark with pure Spark?
It seems that Ganglia can do that externally.
I was wondering if anything inside Spark (e.g., the information that Spark reports to the UI, or the metrics info) can give you the core utilization like what Linux top does. Not how many cores each executor are using at a certain time (coreUsed), but how fully utilized these cores are.
You are on the right track with considering the Ganglia or other external monitoring tools/frameworks.
The Spark Scheduler keeps track of task/job progress .. but not the resource utilization. The spark executors allow the tasks to run - and report success/failures - but do not self-monitor the resource utilization either.