Does it make sense to use an IDENTITY column in a raw layer?

23 Views Asked by At

Generally speaking, when I am building some ETL process, I import data into the database into what I call the "raw" layer first - meaning the data that is imported is not transformed in any way. It is simply whatever data the DB receives from the source. I do transformations using stored procedures and usually will have a stage layer that I merge the transformed data into. The transformations all happen between the raw layer and the stage layer.

When I create the raw layer, I don't usually add an IDENTITY column to the tables because the tables will be truncated anyway. I only retain data in the final, "prod" layer.

Does it ever make sense to use an IDENTITY column in this initial raw layer? I am not merging any data into the raw tables, so I don't need the IDENTITY column as a primary key...but I am wondering if there are other benefits I'm unaware of.

1

There are 1 best solutions below

0
Julio Di Egidio -- inactive On

Does it ever make sense to use an IDENTITY column in this initial raw layer?

Usually the opposite makes sense: do not generate that field (any field) automatically, since the generated value in general would not match the original value and you'd also have to re-adjust all the FKs: and there is just no point there, plus you'd still need to keep a copy of the original IDs for an easy way to match back the raw data to the original.

Incidentally, I would rather put as many constraints as possible on this raw layer (including PKs and FKs), i.e. as many as there are assumptions about the original data: which would usually drive few rounds of preliminary clean-up of the original; plus, in particular, putting FKs in place requires importing the data in the right order. But, IMO, that is worth the price as to ensuring data consistency and making the subsequent transformation steps simpler and more to the point.