ROW_NUMBER () OVER in impala
I have a use case where I need to use ROW_NUMBER () over PARTITION: Something like:
SELECT
Column1 , Column 2
ROW_NUMBER() OVER (
PARTITION BY ACCOUNT_NUM
ORDER BY FREQ, MAN, MODEL) as LEVEL
FROM
TEST_TABLE
I need a workaround for this in Impala. Unfortunately Impala does not support subqueries, nor does it support ROW_NUMBER () OVER functionality. Thank you for your help.
+5
source to share
4 answers
Impala is pretty limited for this type of query. With some assumptions, this query is possible:
- The four columns in a split clause are never
NULL
- The four columns in the split clause uniquely identify the row
The request is pretty ugly and expensive:
select tt.column1, tt.column2, count(*) as level
from test_table tt join
test_table tt2
on tt.account_num = tt2.account_num and
(tt2.freq < tt.freq or
tt2.freq = tt.freq and tt2.man < t.man or
tt2.freq = tt.freq and tt2.man = t.man and tt2.model <= t.model
)
group by tt.column1, tt.column2, tt.account_num, tt.freq, tt.man, tt.model;
+4
source to share
Impala now supports conspiracy. The syntax is the same as in the question.
SELECT
Column1 , Column 2
ROW_NUMBER() OVER (
PARTITION BY ACCOUNT_NUM
ORDER BY FREQ, MAN, MODEL) as LEVEL
FROM
TEST_TABLE
Impala documentation: https://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_analytic_functions.html#over
+1
source to share