How to find memory usage for a koalas dataframe

187 Views Asked by At

I am trying to do some memory profiling on an azure databricks job. This job uses a python script that relies heavily on koalas dataframes for analysis. I want to analyze which dataframes or objects are taking up the most memory but koalas and databricks make this very difficult to do on a code level.

I have tried checking the spark UI of my job but this does not display memory information on an object level. I also tried using memory_usage() in the following example, which would work in pandas but in koalas it fails. I also tried the koalas .info() function, but it does not provide the information I am looking for.

import databricks.koalas as ks

d = {'col1': [1, 2], 'col2': [3, 4]}
df = ks.DataFrame(data=d, columns=['col1', 'col2'])

print(df.memory_usage()) #Line fails
df.info() #Docs say it includes memory information but it does not show size

Is there any way I can see the amount of memory each koalas dataframe has taken up, either through using a memory function or a profiling tool? I would settle for a databricks job profiling tool if it can indicate where in my code most of the memory is being used.

0

There are 0 best solutions below