I am running an application on pyspark. For this application below is the snapshot of the distribution of executors. It looks like non-uniformly distributed. Can someone have look and tell where is the problem.
Discription and My Problem:-
I am running my application on huge data, in which I am filtering and joining 3 datasets. After that, I am caching joined data set for generating and aggregating features for the different time period (means my cached data set generate features in the loop). After this, I am trying store these features in a partquet file. This parquet file is taking too much time.
Can any help me to solve this? let me know if you need further information.

As you stated (emphasis mine):
Both joins and, to lesser extent, aggregations might result in a skewed distribution of data if join key or grouping columns are not uniformly distributed - it is a natural consequence of the required shuffles.
In general case there is very little you can do about it. In specific cases it is possible to gain a little with broadcasting or salting, but it doesn't look like the problem is particularly severe in your case.