Why does this boolean choice ignore NULLs?
I have a table my_table
that contains a column boolean
my_value
. I get a somewhat unexpected result when I query a table in Shark:
shark> SELECT my_value, COUNT(*) FROM my_table GROUP BY my_value;
OK
true 182285
false 81968
NULL 7594
Time taken: 14.028 seconds
shark> SELECT COUNT(*) FROM my_table WHERE my_value=true;
OK
182285
Time taken: 13.787 seconds
shark> SELECT COUNT(*) FROM my_table WHERE my_value IS NULL;
OK
7594
Time taken: 13.387 seconds
shark> SELECT COUNT(*) FROM my_table WHERE my_value=true or my_value IS NULL;
OK
182285
Time taken: 13.406 seconds
I expect the last query to return 189879
(i.e. 182285 + 7594). Why is it wrong?
For the curious reader, this seems to give the correct result:
shark> SELECT COUNT(*) FROM my_table WHERE isnull(my_value) or my_value=true;
OK
189879
Also, this is not an operator precedence issue:
shark> SELECT COUNT(*) FROM my_table WHERE (my_value=true) or (my_value IS NULL);
OK
182285
Update: It looks like the statement IS
in the sentence is WHERE
not doing what I expect:
shark> SELECT my_value IS NULL FROM my_table WHERE my_value IS NULL LIMIT 10;
14/11/26 11:34:52 WARN parse.ASTRewriteUtil: Query contains a LIMIT. Skipping applicable COUNT DISTINCT rewrites.A LIMIT shouldn't be paired with an aggregation that only returns one line ...
OK
false
false
false
false
false
false
false
false
false
false
It is even more surprising that it SELECT COUNT(*) FROM my_table WHERE my_value IS NULL;
returned the correct result.
source to share
No one has answered this question yet
Check out similar questions: