tensorflow data validation is crashing on 4 cores machine when I process more than 1.5 Gig CSV

110 Views Asked by At

I am trying to run tensor flow data validation feature on data sets (CSV) > 2 Gig. It is crashing after some time. It runs very well if data set is around 1 Gig. How to handle large data sets without using cloud data flow service.

1

There are 1 best solutions below

0
On

What is the RAM of your PC and which function are you trying to use in Tensorflow Data Validation?

To utilize all the Cores of your PC for Processing, you can try the function mentioned below :

tfdv.generate_statistics_from_dataframe(dataframe,   stats_options=options.StatsOptions(), n_jobs= -1)

If you set the parameter, n_jobs = -1 it uses all the 4 CPU cores of your PC.