How surrogate keys are handles in hive

1.9k Views Asked by smellerbee At 16 August 2025 at 22:10

I know that hive cannot create surrogate keys or is rather difficult. I want to understand how companies have implemented dimensional modeling in their warehouse.

One way I can think of is leaving the dimension details as is in fact. Then move the distinct of dimension to a different table. But then how are scd1 and scd2 handled. I have checked talks by Kimball on cloudera and I still don't understand how this works.

Original Q&A

There are 2 best solutions below

Ron Dunn On 12 August 2018 at 04:40

There are two ways of handling this problem in Hive.

The first does not directly answer your question, and that is to use natural keys instead of surrogates. While surrogates are more convenient and performant, since you're using Hive I'm guessing that performance isn't one of your major criteria, so the cost of using natural keys will just be in the extra lines of code you have to write to cater for compound keys.

The second way is to use Hive's windowing functions to calculate the surrogate. I don't have a Hive environment handy to test this query, but the surrogate would look something like:

(select max(surrogate_key_column) from dimension_table)
+ row_number() over (order by 1)

QuangPH On 06 July 2021 at 07:55

As far as I know, In version 3.0, Hive supports the surrogate keys on ACID tables

https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/using-hiveql/content/hive_surrogate_keys.html

Summarised from the link:

The SURROGATE_KEY UDF generates a unique Id for every row that you insert into a table.

Example usage:

-Create a table

CREATE TABLE students_v2 
(`ID` BIGINT DEFAULT SURROGATE_KEY(),
 row_id INT,
 name VARCHAR(64), 
 dorm INT, 
 PRIMARY KEY (ID) DISABLE NOVALIDATE);

-Insert data, which automatically generates surrogate keys for the primary keys.

INSERT INTO students_v2 (row_id, name, dorm) SELECT * FROM students;

-Take a look at the surrogate keys.

SELECT * FROM students_v2;

How surrogate keys are handles in hive

There are 2 best solutions below

Related Questions in HIVE

Related Questions in DATA-WAREHOUSE

Related Questions in DIMENSIONAL-MODELING

Related Questions in SURROGATE-KEY

Trending Questions

Popular # Hahtags

Popular Questions