How to count a feature x if y condition is true (bigquery)

17.8k Views Asked by At

I am new in machine learning and I am running one of my first machine learning problems. I do a lot of research by myself but could not find the solution for this question.

I am creating new variables from my dataset in Bigquery. One of the variables consists in count a variable (x) if a condition in variable (y) is met. I cannot use WHERE or HAVING since the condition is only related to the calculation of this specific variable.

The code is more or less like this:

COUNT(DISTINCT sessionid IF(date > “2018-06-01” && date < “2018-06-30”)) 
AS sessions_lastmonth

I know that this code is not correct and could not be but is just a way to express more or less what I need. The goal is to know the number of sessions in a certain period. I cannot use the count for date because in one day you can have more than one session and not every row is a different session (a session can have several lines because the user can go through many pages in the same session).

Thank you!

2

There are 2 best solutions below

3
Mikhail Berlyant On BEST ANSWER

Below for BigQuery Standard SQL

COUNT(DISTINCT IF(date >= '2018-06-01' AND date <= '2018-06-30', sessionid, NULL)) AS sessions_lastmonth
1
Richard Hansell On

You said you couldn't use WHERE so I am going to assume that the answer that proposes you use WHERE won't fly?

I think you might need to use CASE instead, so your query might look something like:

COUNT(DISTINCT CASE WHEN date > “2018-06-01” AND date < “2018-06-30” THEN sessionid ELSE NULL END) AS sessions_lastmonth

That syntax might not be right, but it might help you onto the right track?