Why does this boolean choice ignore NULLs?

I have a table my_table

that contains a column boolean

my_value

. I get a somewhat unexpected result when I query a table in Shark:

shark> SELECT my_value, COUNT(*) FROM my_table GROUP BY my_value;
OK
true    182285
false   81968
NULL    7594
Time taken: 14.028 seconds
shark> SELECT COUNT(*) FROM my_table WHERE my_value=true;
OK
182285
Time taken: 13.787 seconds
shark> SELECT COUNT(*) FROM my_table WHERE my_value IS NULL;
OK
7594
Time taken: 13.387 seconds
shark> SELECT COUNT(*) FROM my_table WHERE my_value=true or my_value IS NULL;
OK
182285
Time taken: 13.406 seconds

      

I expect the last query to return 189879

(i.e. 182285 + 7594). Why is it wrong?

For the curious reader, this seems to give the correct result:

shark> SELECT COUNT(*) FROM my_table WHERE isnull(my_value) or my_value=true;
OK
189879

      

Also, this is not an operator precedence issue:

shark> SELECT COUNT(*) FROM my_table WHERE (my_value=true) or (my_value IS NULL);
OK
182285

      

Update: It looks like the statement IS

in the sentence is WHERE

not doing what I expect:

shark> SELECT my_value IS NULL FROM my_table WHERE my_value IS NULL LIMIT 10;
14/11/26 11:34:52 WARN parse.ASTRewriteUtil: Query contains a LIMIT. Skipping applicable COUNT DISTINCT rewrites.A LIMIT shouldn't be paired with an aggregation that only returns one line ...
OK
false
false
false
false
false
false
false
false
false
false

      

It is even more surprising that it SELECT COUNT(*) FROM my_table WHERE my_value IS NULL;

returned the correct result.

+3
hive apache-spark


source to share


No one has answered this question yet

Check out similar questions:

6
Hive: how to check and find zero entries in the map?
4
using JSON-SerDe in Hive tables
4
Hive's solution to select / treat null row as NULL
2
Why does "select unix_timestamp ('') null" return false when "select unix_timestamp ('')" returns null?
1
Hive counter request (*) does not call mapreduce
1
Python catch request is capped at 100
1
Create Avro Hive table from S3 bucket
0
Getting a clear combination of column values ​​and their number in Hive
0
Avro: schema evolution - attribute resizing (in Hive)
0
Rolling Distinct Count in the Hive



All Articles
Loading...
X
Show
Funny
Dev
Pics