I am trying to flag new customers and attritted customers in my database. The objective is to have a pretty simple flat table where I can just pull "new customers" and "lost customers" for a given business and a given year.
I have a table that looks like this:
BUSINESS, CUSTOMER, YEAR
Business X, Customer A, 2001
Business X, Customer A, 2002
Business X, Customer A, 2003
Business X, Customer B, 2004
Business X, Customer B, 2005
Business Y, Customer A, 2004
And I would like to put two new columns in my table such that I flag if a customer is "NEW" in that year, or is "GONE" the next year for that business line. So the end result should look like this:
BUSINESS, CUSTOMER, YEAR, NEW, GONE
Business X, Customer A, 2001, NEW, NULL
Business X, Customer A, 2002, NULL, NULL
Business X, Customer A, 2003, NULL, GONE
Business X, Customer B, 2004, NEW, NULL
Business X, Customer B, 2005, NULL, GONE
Business Y, Customer A, 2004, NEW, NULL
Thanks so much in advance for your help. I am working on this in SQL but also in Google Cloud Dataprep, and am a terrible coder and very open to brute force techniques!!
One solution is possible using Correlated Subqueries, with
Exists()
condition.In the first subquery, we determine if any previous
YEAR
exists for a particularBUSINESS
,CUSTOMER
andYEAR
combination. If itexists()
, we set theNEW
toNULL
(since already another row exists before this year).In the second subquery, we determine if any later
YEAR
exists for a particularBUSINESS
,CUSTOMER
andYEAR
combination. If itexists()
, we set theGONE
toNULL
(since already another row exists after this year).