Complex SQL writing

389 Views Asked by At

I have this table:

table session(
ID number,
SessionID VarChar,
Date,
Filter
)

This table contains searches info, like this:

ID  SessionID                   Date                filter
4   peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    meagPixel=5
6   peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    brand=Canon
7   peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    brand=Canon&meagPixel=12.1
8   peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    brand=Canon
10  peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    brand=Nikon
12  peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    meagPixel=12.1
13  peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    meagPixel=12.1&opticalZoom=True
14  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    meagPixel=12.1&opticalZoom=True&brand=Panasonic
16  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    price=500.00
18  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    price=499.00
19  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    price=499.00&brand=Olympus
21  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    zoomRange=2000
22  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    zoomRange=2000&brand=Leica
23  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    zoomRange=2000&brand=Leica&price=1995.00
24  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True
25  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2
26  peqq421gaspts3nuulq5mwcq    24/05/2012 13:50    zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2&weight=345
27  peqq421gaspts3nuulq5mwcq    24/05/2012 13:58    zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2
41  poiq41111spts00000q5aaaa    27/05/2012 13:48    meagPixel=5

I want to get unique searches. Unique searches are:

  • the longest search(filter) of the user(session)
  • if the first filter has changes - it needed to be treated as new search(filter)

Since ASP.NET doesnt guarantee the SessionID is unique (SessionID,Date) is unique.

I didnt get far after:

SELECT        MAX(Filter)
FROM            Session
GROUP BY SessionID

BTW the result for the example table data I gave should return this:

ID  SessionID                   Date                filter              
4   peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    meagPixel=5     
7   peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    brand=Canon&meagPixel=12.1      
10  peqq421gaspts3nuulq5mwcq    24/05/2012 13:48    brand=Nikon     
14  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    meagPixel=12.1&opticalZoom=True&brand=Panasonic     
16  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    price=500.00        
19  peqq421gaspts3nuulq5mwcq    24/05/2012 13:49    price=499.00&brand=Olympus      
26  peqq421gaspts3nuulq5mwcq    24/05/2012 13:50    zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2&weight=345     
41  poiq41111spts00000q5aaaa    27/05/2012 13:48    meagPixel=5     

Thanks for any help and guiding.

3

There are 3 best solutions below

4
On BEST ANSWER

@GarethD - Tx for Schema and insert query. I have tried slight different approach. I am not sure if this will work in all scenario. It worked in mysql and mssql.

          select * 
          from tsession t1 
          where  not exists (
                             select * 
                             from tsession t2 
                             where t2.filter  like concat(t1.filter,'%') 
                             and t1.filter<>t2.filter 
                             and t1.sessionid=t2.sessionid) 
          order by id;

This gives exact result as required in Question.

0
On

To get the longest search filter, you need to do something like:

select s.*
from (select s.*,
             row_number() over (partition by sessionid order by len desc) as rownum
      from (select s.*, len(filter) as len
            from session s
           ) s
     ) s
where rownum = 1

I'm doing this with a windows function. You can do the same thing by using an aggregation and a join.

However, you are saying that session is not the real identifier. Session/filter is. The following query pretty much gets what you want:

select s.*
from (select s.*,
             row_number() overo over (partition by sessionid, filter 
                                      order by len desc) as rownum
      from (select s.*, len(filter) as len
            from session s
           ) s
     ) s
where rownum = 1

(The only change is to the partitioning clause to include filter.)

You may have duplicates. If you want all the duplicates, a slightly different query would work.

7
On

Firstly, there looks to be an error in your sample data I think rows 25, 26 and 27 should all appear in your final data. 27 certainly should since it is the only entry for that combination of session ID and Date.

Assuming the above is correct then I think I have established your logic correctly.

Step 1 is to define the first search term for each filter, and the order in which it occurred within the session:

;WITH CTE AS
(   SELECT  *, 
            SUBSTRING(Filter, 1, CASE WHEN CHARINDEX('&', Filter) = 0 THEN LEN(Filter) ELSE CHARINDEX('&', Filter) - 1 END) [FirstTerm],
    FROM    Session
)

The next step is to work out if each search is a new search, or a continuation of the previous search. This is done by getting the Previous search term in the session (Why SessionOrder was defined in the last CTE) and establishing if the first search term is the same.

, CTE2 AS
(   SELECT  T1.*, 
            CASE WHEN T1.SessionOrder = 1 OR T2.SessionOrder IS NOT NULL THEN 1 ELSE 0 END [NewSearch]
    FROM    CTE T1
            LEFT JOIN CTE T2
                ON  T1.SessionID = T2.SessionID
                AND T1.Date = T2.Date
                AND T1.FirstTerm != T2.FirstTerm
                AND T1.SessionOrder = T2.SessionOrder + 1
)

Next, each new search needs it's own rank within the session, for grouping purpuses. Then You have your rules defined (a unique combination of SessionID, Date, and First Search term), you can then order each item within the unique combination depending on the length of the filter:

, CTE3 AS
(   SELECT  *,
            ROW_NUMBER() OVER(PARTITION BY SessionID, Date, ISNULL(SearchNumber, 0) ORDER BY LEN(Filter) DESC) [SearchOrder]
    FROM    CTE2 T1
            OUTER APPLY
            (   SELECT  SUM(NewSearch) [SearchNumber]
                FROM    CTE2 T2
                WHERE   T1.SessionOrder >= T2.SessionOrder
                AND     T1.SessionID = T2.SessionID
                AND     T1.Date = T2.Date
            ) c
)

Finally, all you need to do is limit the results to the longest search term for each combination of SessionID, Date and first filter term:

SELECT  ID, SessionID, Date, Filter
FROM    CTE3
WHERE   SearchOrder = 1
ORDER BY ID

Normally I would put this all together on SQLFiddle rather than post a full working example here but it doesn't seem to be working today. So here is my Full SQL I used to test your data:

CREATE TABLE #Session (ID INT, SessionID VARCHAR(50), Date DATETIME, Filter VARCHAR(200))
INSERT INTO #Session VALUES
    (2, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Canon'),
    (4, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'meagPixel=5'),
    (6, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Canon'),
    (7, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Canon&meagPixel=12.1'),
    (8, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Canon'),
    (10, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'brand=Nikon'),
    (12, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'meagPixel=12.1'),
    (13, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:48', 'meagPixel=12.1&opticalZoom=True'),
    (14, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'meagPixel=12.1&opticalZoom=True&brand=Panasonic'),
    (16, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'price=500.00'),
    (18, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'price=499.00'),
    (19, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'price=499.00&brand=Olympus'),
    (21, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000'),
    (22, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000&brand=Leica'),
    (23, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000&brand=Leica&price=1995.00'),
    (24, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True'),
    (25, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:49', 'zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2'),
    (26, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:50', 'zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2&weight=345'),
    (27, 'peqq421gaspts3nuulq5mwcq', '24/05/2012 13:58', 'zoomRange=2000&brand=Leica&price=1995.00&opticalZoom=True&meagPixel=16.2'),
    (41, 'poiq41111spts00000q5aaaa', '27/05/2012 13:48', 'meagPixel=5')

;WITH CTE AS
(   SELECT  *, 
            SUBSTRING(Filter, 1, CASE WHEN CHARINDEX('&', Filter) = 0 THEN LEN(Filter) ELSE CHARINDEX('&', Filter) - 1 END) [FirstTerm],
    FROM    #Session
), CTE2 AS
(   SELECT  T1.*, 
            CASE WHEN T1.SessionOrder = 1 OR T2.SessionOrder IS NOT NULL THEN 1 ELSE 0 END [NewSearch]
    FROM    CTE T1
            LEFT JOIN CTE T2
                ON  T1.SessionID = T2.SessionID
                AND T1.Date = T2.Date
                AND T1.FirstTerm != T2.FirstTerm
                AND T1.SessionOrder = T2.SessionOrder + 1
), CTE3 AS
(   SELECT  *,
            ROW_NUMBER() OVER(PARTITION BY SessionID, Date, ISNULL(SearchNumber, 0) ORDER BY LEN(Filter) DESC) [SearchOrder]
    FROM    CTE2 T1
            OUTER APPLY
            (   SELECT  SUM(NewSearch) [SearchNumber]
                FROM    CTE2 T2
                WHERE   T1.SessionOrder >= T2.SessionOrder
                AND     T1.SessionID = T2.SessionID
                AND     T1.Date = T2.Date
            ) c
)
SELECT  ID, SessionID, Date, Filter
FROM    CTE3
WHERE   SearchOrder = 1
ORDER BY ID

DROP TABLE #Session

ADDENDUM

OK, Based on your results set you don't actually want to group by the date column, all you need is to put the rows in order of length grouped by the first search term and sessionID.

This query produces the same results as your sample data. I've tested this in 2008 R1, but see no reason it would not work in SQL-Server CE.

;WITH CTE AS
(   SELECT  *,
            ROW_NUMBER() OVER(PARTITION BY SessionID, SUBSTRING(Filter, 1, CASE WHEN CHARINDEX('&', Filter) = 0 THEN LEN(Filter) ELSE CHARINDEX('&', Filter) - 1 END) ORDER BY LEN(Filter) DESC) [RowNumber]
    FROM    Session
)
SELECT  *
FROM    CTE
WHERE   RowNumber = 1
ORDER BY ID

SQL Fiddle of final solution