Does it make sense to use an IDENTITY column in a raw layer?

23 Views Asked by SUMguy At 27 February 2024 at 15:52

Generally speaking, when I am building some ETL process, I import data into the database into what I call the "raw" layer first - meaning the data that is imported is not transformed in any way. It is simply whatever data the DB receives from the source. I do transformations using stored procedures and usually will have a stage layer that I merge the transformed data into. The transformations all happen between the raw layer and the stage layer.

When I create the raw layer, I don't usually add an IDENTITY column to the tables because the tables will be truncated anyway. I only retain data in the final, "prod" layer.

Does it ever make sense to use an IDENTITY column in this initial raw layer? I am not merging any data into the raw tables, so I don't need the IDENTITY column as a primary key...but I am wondering if there are other benefits I'm unaware of.

Original Q&A

There are 1 best solutions below

Julio Di Egidio -- inactive On 28 February 2024 at 12:17

Does it ever make sense to use an IDENTITY column in this initial raw layer?

Usually the opposite makes sense: do not generate that field (any field) automatically, since the generated value in general would not match the original value and you'd also have to re-adjust all the FKs: and there is just no point there, plus you'd still need to keep a copy of the original IDs for an easy way to match back the raw data to the original.

Incidentally, I would rather put as many constraints as possible on this raw layer (including PKs and FKs), i.e. as many as there are assumptions about the original data: which would usually drive few rounds of preliminary clean-up of the original; plus, in particular, putting FKs in place requires importing the data in the right order. But, IMO, that is worth the price as to ensuring data consistency and making the subsequent transformation steps simpler and more to the point.

Does it make sense to use an IDENTITY column in a raw layer?

There are 1 best solutions below

Related Questions in DATABASE-DESIGN

Related Questions in DATA-WAREHOUSE

Trending Questions

Popular # Hahtags

Popular Questions