Parse error while using ROW_NUMBER OVER PARTITION BY

2k Views Asked by At

I have used the below query on Hive and got the following parse error though the query does not seem to have any issues.

SELECT TO_DATE(o.order_date), profit, 
       ROW_NUMBER() OVER (PARTITION BY YEAR(o.order_date) ORDER BY profit desc) AS n 
FROM (
    SELECT TO_DATE(o.order_date), SUM(price) AS revenue, 
           SUM(price-cost) as profit 
    FROM products p, order_details d, orders o 
    WHERE (d.prod_id=p.prod_id) AND (d.order_id=o.order_id) 
    GROUP BY o.order_date
)

Error is:

error while compiling statement: failed: parseexception line 6:22 cannot recognize input near '' '' '' in subquery source

2

There are 2 best solutions below

0
On

Well just to mention. You use in your SELECT some aliases like o.order_date) in your TO_DATE and insider your OVER-clause. But your FROM part is just a query in braces without an given alias after the closing brace ).

I would expect ) as o instead.

0
On

I see 3 issues with the HQL:

  1. The group by in the sub query needs to match all non-aggregate columns in the select statement

  2. You need to alias the TO_DATE(o.order_date) bit in the sub query in order to reference it in the top level query (or refer to its system generated alias). It will choke if you attempt to refer to o.order_date in the super query if you don't pass o.order_date up from the sub query (and alias the sub query as o). This also applies to the o.order_date call in the over statement

  3. This might not be causing issues but most of the versions of HIVE that I have used choke if you don't alias sub-queries.

The below code should fix your issues (assuming there are no data issues that that we can't see from looking at the HQL):

SELECT 
    a.dt, 
    a.profit, 
    ROW_NUMBER() OVER (PARTITION BY YEAR(a.dt) ORDER BY profit desc) AS n 
FROM
    (SELECT 
         TO_DATE(o.order_date) dt, 
         SUM(price) AS revenue, 
         SUM(price-cost) as profit 
     FROM 
         products p, order_details d, orders o 
     WHERE 
         (d.prod_id = p.prod_id) 
         AND (d.order_id = o.order_id) 
     GROUP BY 
         TO_DATE(o.order_date) ) a