Creating a new column/field OVER id in denodo

638 Views Asked by At

I was wondering if I could get some help in terms of calculating a difference between the date of various statuses. I have a view with column named “id”, “create_dt” and “status”. I would have various statuses like submit, approve, deliver etc. Now, I want to find the time it took for a specific id between Approve and Submit status. What I am thinking currently is, creating few additional fields based on the status (I can use case statement for that) and finding the difference in time between the newly created date columns for various statuses. The problem here is, I am not sure how to drive the calculation based on the specific id. I can’t do lag or lead because some “id” might go through different statuses and some might not (it’s not consistent). I can’t create a the new date column based on id (something like partition by) because I am using case statement. Could someone point me to the right direction?

Below is the screenshot of how my data currently looks like (using the case statement) and what my desired output is. Current Result

Expected Result

From the expected result, I could easily find the difference between submitted and approved status for any ID using case statement whereas for the current result, I am not able to.

Thank you,

1

There are 1 best solutions below

3
On BEST ANSWER

I would try pivoting the data. Here is a link to a Denodo community site that shows how to do this:

https://community.denodo.com/kb/view/document/How%20to%20Pivot%20and%20Unpivot%20views?category=Combining+Data

For your specific case, I created a small excel data source to simulate your issue in a view I named "p_sample" (using simplified dates and status names):

id | status | create_dt
1 | submit   | 1/1/2017
1 | approve | 2/1/2017
1 | deliver   | 2/2/2017
2 | submit   | 1/1/2017
2 | approve | 1/10/2017
2 | deliver   | 2/1/2017
3 | submit   | 1/1/2017
....

Since Denodo doesn't appear to support the PIVOT operator, instead we can use the following VQL to pivot your status dates so they are all on the same row:

select id
  , max(case when status = 'submit' then create_dt end) as submit_dt
  , max(case when status = 'approve' then create_dt end) as approve_dt
  , max(case when status = 'deliver' then create_dt end) as deliver_dt
  , max(case when status = 'reject' then create_dt end) as reject_dt
  , max(case when status = 'other' then create_dt end) as other_dt
from p_sample
group by id

Then we can use that query as an inline view to perform the date math (or in Denodo you could make this 2 views - one with the above VQL and then a selection view on top of that which applies the date math):

select *, approve_dt - submit_dt as time_to_aprove
from (
select id
  , max(case when status = 'submit' then create_dt end) as submit_dt
  , max(case when status = 'approve' then create_dt end) as approve_dt
  , max(case when status = 'deliver' then create_dt end) as deliver_dt
  , max(case when status = 'reject' then create_dt end) as reject_dt
  , max(case when status = 'other' then create_dt end) as other_dt
from p_sample
group by id
) AS Pivot

When you run this, you get each status date for the ID, as well as the time between submission and approval.

Query Results

The only drawback is if the status code list is very large or not well controlled, then this solution will not be flexible enough, but your example seems to indicate this won't be an issue.