How can I predict the cardinality increase of an aggregation table, by grouping by another column?

32 Views Asked by At

Let's say I have visitor logs for my website. It contains many fields, such country, state, city, device, time, referral source, etc.

I have a stats table that groups by some of these dimensions, and sums the impressions, by hour. My site is extremely popular, so this table is large (billions of rows / day).

I have a new request: a data scientist now wants me group by ANOTHER dimension column here. Let's say we only did the granularity of country before, and now she requests to add a column to group by, which is STATE.

How can I predict the cardinality increase of my stats table?

It's tempting to say, it would be (cardinality of new dimension column we're grouping by) * # of existing rows. But this might falter because population density isn't evenly spread out: over a tenth of the US population lives in California, for example.

0

There are 0 best solutions below