If my raw data is in CSV format and I would like to store it in the Bronze layer as Delta tables then I would end up with four layers like Raw+Bronze+Silver+Gold. Which approach should I consider?
How to handle CSV files in the Bronze layer without the extra layer
317 Views Asked by Su1tan At
1
There are 1 best solutions below
Related Questions in DATABRICKS
- Not able to read text file from local file path - Spark CSV reader
- Spark with Scala: write null-like field value in Cassandra instead of TupleValue
- Spark SQL get max & min dynamically from datasource
- How to convert RDD string(xml format) to dataframe in spark java?
- Zeppelin 6.5 + Apache Kafka connector for Structured Streaming 2.0.2
- How to connect Tableau to Databricks Spark cluster?
- Confused about the behavior of Reduce function in map reduce
- Extract String from Spark DataFrame
- Saving a file locally in Databricks PySpark
- How to add Header info to row info while parsing a xml with spark
- Databricks display() function equivalent or alternative to Jupyter
- Select distinct query taking too long in databricks
- Create SQL user in Databricks
- Different delimiters on different lines in the same file for Databricks Spark
- Combine multiple columns into single column in SPARK
Related Questions in DELTA-LAKE
- How to use delta lake with Spark 2.4.4
- check if delta table exists on a path or not in databricks
- Why Databricks Delta is copying unmodified rows even when merge doesn't update anything?
- DeltaLake: How to Time Travel infinitely across Datasets?
- Add new column to the existing table in Delta lake(Gen2 blob storage)
- Error when trying to move data from on-prem SQL database to Azure Delta lake
- Deduplicate Delta Lake Table
- Streaming data into delta lake, reading filtered results
- Optimize blob storage Deltalake using local scope table on Azure Databricks
- How to add Delta Lake support to Zeppelin's spark interpreter?
- Why does Delta Lake seem to store so much redundant information?
- Reference 'unit' is ambiguous, could be: unit, unit
- Snowflake interprets boolean values in parquet as NULL?
- Deleting from a DeltaTable using a dataframe of keys
- pyspark delta table: How to save a grouped Dataframe to Different Tables
Related Questions in DATA-LAKE
- Powershell -recursive in Azure Data Lake Store
- Handle new data in "Raw" zone of Lakehouse
- Dbeaver doesn't display metadata from one of our hive instances. How to fix?
- How to deal with historicization data in a data lake vs data warehouse?
- Delta Lake: don't we need time partition for full reprocessed tables anymore
- What Happens When a Delta Table is Created in Delta Lake?
- How to convert timestamp to AWS data lake s3 timestamp
- using multiple integration tools on hdfs
- Unable to see tables in the AWS datalake/glue UI
- Can Glue Crawler crawl the deltalake files to create tables in aws glue catalogue?
- Data Lake with Kimball's Star Schema and Data Mart
- Get ADLS directory and sub-directory paths till it gets the file format in a table using databricks
- Is Data Lake and Big Data the same?
- Multiple Tableau users connected to Hive LLAP
- Streaming data from Aurora to S3 for Data Lake
Related Questions in DATA-LAKEHOUSE
- In Model view for the sql endpoint of a lakehouse, model doesn't persist, Mark Date Table greyed out. Using MS Fabric / PowerBI / Synapse Engineering
- Opensource Datalakehouse with Multi-Node Multi-Drive MinIO object storage
- Delta Lake Size Requirements
- Handle new data in "Raw" zone of Lakehouse
- Do you store data in the Delta Lake Silver layer in a normalized format or do you derive it?
- How to handle CSV files in the Bronze layer without the extra layer
- how are Updates and Deletes handled in both Data Warehouses and Data Lakes?
- ETL / ELT pipelines - Metainformation about the pipeline
- lakeFS, Hudi, Delta Lake merge and merge conflicts
- Fabric Lakehouse PowerBI report: Couldn't load the data for this visual
- Refreshing Lakehouse data during a notebook session in Microsoft Fabric
- Backup Microsoft fabric and prevent easy artifact deletion
- How to read delta tables 2.1.0 in S3 bucket that contains symlink_format_manifest by using AWS glue studio 4.0?
- Trino not able to create table from JSON file
- Saved delta file reads as an df - is it still part of delta lake?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
A bit of an open question, however with respect to retaining the "raw" data in CSV I would normally recommend this as storage of these data is usually cheap relative to the utility of being able to re-process if there are problems or for purpose of data audit/traceability.
I would normally take the approach of compressing the raw files after processing and perhaps tar-balling the files. In addition moving these files to colder/cheaper storage.