Hadoop HIVE - How to query part of rows
If I have a table,
table name : mytable
columns : id, name, sex, age, score
row1 : 1,Albert,M,30,70
row2 : 2,Scott,M,34,60
row3 : 3,Amilie,F,29,75
...
row100 : 100,Jim,M,35,80
I want to select them five times.
1st iteration : row1 ~ row20
2nd iteration : row21 ~ row40
...
5th iteration : row81 ~ row100
How can I request a hive? Is there any known request? All 100 lines are returned below.
SELECT * FROM mytable;
But I really only want to see 20 lines each time.
source to share
This is easily done using the Limit Offset's Myqsl
. Limit hive support but not offset (not 100%) But you can limit your output to
SELECT * FROM mytable
LIMIT 20;
it will only give 20 entries, but not 20-40;
You can do ROW_NUMBER
in the hive
SELECT *,ROW_NUMBER over (Order by id) as rowid FROM mytable
where rowid > 0 and rowid <=20;
next time you need to change the condition in the where clause.
SELECT *,ROW_NUMBER over (Order by id) as rowid FROM mytable
where rowid > 20 and rowid <=40;
You can also pass the rowid variable using a text file or set the variable run the os command and set the value of the put to vive variable
source to share
Updating this. Just in case someone else is trying this solution.
For me, it only worked with parentheses after the row number and a new SELECT statement around the query with the where clause, since the "rowid" alias was not available in the inner SELECT. Wasted me trying to figure it out.
SELECT * FROM (
SELECT *, ROW_NUMBER() OVER(Order by id) as rowid FROM mytable
) t1
WHERE rowid > 0 and rowid <= 20;
source to share