Preprocessing large data in databricks community edition

446 Views Asked by Shihab Masri At 19 April 2022 at 18:22

I have 16 GB dataset and want to use it in databricks. However, in community edition DBFS limit is 10 GB. May you please assist me to preprocess the data to be able to move it from driver to DBFS.

Original Q&A

There are 1 best solutions below

Alex Ott On 20 April 2022 at 11:56

The simplest way for that is not to use DBFS (it's designed only for temporary data), but host data & results in your own environment, like, AWS S3 bucket or ADLS (could be a higher transfer costs).

If you can't use it, then solution depends on other factors - what is the input file format, like, is it compressed/uncompressed, etc.

Preprocessing large data in databricks community edition

There are 1 best solutions below

Related Questions in DATASET

Related Questions in DATABRICKS

Related Questions in LARGE-DATA

Related Questions in DATABRICKS-COMMUNITY-EDITION

Trending Questions

Popular # Hahtags

Popular Questions