Flag gaps and new entries in time series database - customer attrition & new customers

77 Views Asked by At

I am trying to flag new customers and attritted customers in my database. The objective is to have a pretty simple flat table where I can just pull "new customers" and "lost customers" for a given business and a given year.

I have a table that looks like this:

BUSINESS, CUSTOMER, YEAR
Business X, Customer A, 2001
Business X, Customer A, 2002
Business X, Customer A, 2003
Business X, Customer B, 2004
Business X, Customer B, 2005
Business Y, Customer A, 2004

And I would like to put two new columns in my table such that I flag if a customer is "NEW" in that year, or is "GONE" the next year for that business line. So the end result should look like this:

BUSINESS, CUSTOMER, YEAR, NEW, GONE
Business X, Customer A, 2001, NEW, NULL
Business X, Customer A, 2002, NULL, NULL
Business X, Customer A, 2003, NULL, GONE
Business X, Customer B, 2004, NEW, NULL
Business X, Customer B, 2005, NULL, GONE
Business Y, Customer A, 2004, NEW, NULL

Thanks so much in advance for your help. I am working on this in SQL but also in Google Cloud Dataprep, and am a terrible coder and very open to brute force techniques!!

1

There are 1 best solutions below

2
On

One solution is possible using Correlated Subqueries, with Exists() condition.

In the first subquery, we determine if any previous YEAR exists for a particular BUSINESS, CUSTOMER and YEAR combination. If it exists(), we set the NEW to NULL (since already another row exists before this year).

In the second subquery, we determine if any later YEAR exists for a particular BUSINESS, CUSTOMER and YEAR combination. If it exists(), we set the GONE to NULL (since already another row exists after this year).

SELECT 
 t1.BUSINESS, 
 t1.CUSTOMER, 
 t1.YEAR, 
 IF ( EXISTS(SELECT 1 
             FROM your_table AS t2 
             WHERE t2.BUSINESS = t1.BUSINESS AND 
                   t2.CUSTOMER = t1.CUSTOMER AND 
                   t2.YEAR < t1.YEAR 
             LIMIT 1), NULL, 'NEW' ) AS NEW, 
 IF ( EXISTS(SELECT 1 
             FROM your_table AS t3 
             WHERE t3.BUSINESS = t1.BUSINESS AND 
                   t3.CUSTOMER = t1.CUSTOMER AND 
                   t3.YEAR > t1.YEAR 
             LIMIT 1), NULL, 'GONE' ) AS GONE
FROM your_table AS t1