Let's say I have visitor logs for my website. It contains many fields, such country, state, city, device, time, referral source, etc.
I have a stats table that groups by some of these dimensions, and sums the impressions, by hour. My site is extremely popular, so this table is large (billions of rows / day).
I have a new request: a data scientist now wants me group by ANOTHER dimension column here. Let's say we only did the granularity of country before, and now she requests to add a column to group by, which is STATE.
How can I predict the cardinality increase of my stats table?
It's tempting to say, it would be (cardinality of new dimension column we're grouping by) * # of existing rows. But this might falter because population density isn't evenly spread out: over a tenth of the US population lives in California, for example.