I have around 34 million rows, each with 23 columns in the store_sales
table in the tpc-ds dataset .
I have a composite primary key
of columns ss_item_sk
and ss_ticket_number
.
Upon running the query SELECT count(DISTINCT <primary key>) ..
I'm able to see that it outputs the total number of rows present in the table.
Now I add another column along with the primary key
, namely ss_sold_date_sk
.
After this, if I run the count
query, I get fewer number of rows being printed than before. Can someone explain to me, via example, why this could happen?
TL;DR
When does adding a column to a composite primary key ever stop making it unique?
Demo
As Expected - 3 distinct values - 'A','B' and 'C'
As expected. Why? - see next query
Concatenation with NULL yields NULL
BUG!!
Here is a work-around the bug: