I'm not sure on the concept of memory foot print. When loading a parquet file of eg. 1GB and creating RDDs out of it in Spark, What would be the memory food print for each RDD?
RDD Memory footprint in spark
1.5k Views Asked by spark_dream At
2
There are 2 best solutions below
Related Questions in APACHE-SPARK
- Column displays each count
- MAX and GROUP BY - SQL
- Best Practice for adding columns to a Table in Oracle database
- Updating an Oracle row with value from same row
- Retrieving data from Oracle database
- Ibatis execute update sql on oracle, it is not working and no exceptions
- Building an sql execution plan history
- Implementation of Rank and Dense Rank in MySQL
- how to update the date field for this specific condition using oracle query?
- Oracle stored procedure wrapping compile error with inline comments
Related Questions in COMPRESSION
- Column displays each count
- MAX and GROUP BY - SQL
- Best Practice for adding columns to a Table in Oracle database
- Updating an Oracle row with value from same row
- Retrieving data from Oracle database
- Ibatis execute update sql on oracle, it is not working and no exceptions
- Building an sql execution plan history
- Implementation of Rank and Dense Rank in MySQL
- how to update the date field for this specific condition using oracle query?
- Oracle stored procedure wrapping compile error with inline comments
Related Questions in RDD
- Column displays each count
- MAX and GROUP BY - SQL
- Best Practice for adding columns to a Table in Oracle database
- Updating an Oracle row with value from same row
- Retrieving data from Oracle database
- Ibatis execute update sql on oracle, it is not working and no exceptions
- Building an sql execution plan history
- Implementation of Rank and Dense Rank in MySQL
- how to update the date field for this specific condition using oracle query?
- Oracle stored procedure wrapping compile error with inline comments
Related Questions in PARQUET
- Column displays each count
- MAX and GROUP BY - SQL
- Best Practice for adding columns to a Table in Oracle database
- Updating an Oracle row with value from same row
- Retrieving data from Oracle database
- Ibatis execute update sql on oracle, it is not working and no exceptions
- Building an sql execution plan history
- Implementation of Rank and Dense Rank in MySQL
- how to update the date field for this specific condition using oracle query?
- Oracle stored procedure wrapping compile error with inline comments
Related Questions in MEMORY-FOOTPRINT
- Column displays each count
- MAX and GROUP BY - SQL
- Best Practice for adding columns to a Table in Oracle database
- Updating an Oracle row with value from same row
- Retrieving data from Oracle database
- Ibatis execute update sql on oracle, it is not working and no exceptions
- Building an sql execution plan history
- Implementation of Rank and Dense Rank in MySQL
- how to update the date field for this specific condition using oracle query?
- Oracle stored procedure wrapping compile error with inline comments
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
When you create an RDD out of a parquet file, nothing will be loaded/executed until you run an action (e.g., first, collect) on the RDD.
Now your memory footprint will most likely vary over time. Say you have 100 partitions and they are equally-sized (10 MB each). Say you are running on a cluster with 20 cores, then at any point in time you only need to have
10MB x 20 = 200MB
data in memory.To add on top of this, given that Java objects tend to take more space, it's not easy to say exactly how much space your 1GB file will take in the JVM Heap (assuming you load the entire file). It could me 2x or it can be more.
One trick you can do to test this is force your RDD to be cached. You can then check in the Spark UI under Storage and see how much space that RDD took to cache.