Why does this boolean choice ignore NULLs?

I have a table my_table

that contains a column boolean

my_value

. I get a somewhat unexpected result when I query a table in Shark:

shark> SELECT my_value, COUNT(*) FROM my_table GROUP BY my_value;
OK
true    182285
false   81968
NULL    7594
Time taken: 14.028 seconds
shark> SELECT COUNT(*) FROM my_table WHERE my_value=true;
OK
182285
Time taken: 13.787 seconds
shark> SELECT COUNT(*) FROM my_table WHERE my_value IS NULL;
OK
7594
Time taken: 13.387 seconds
shark> SELECT COUNT(*) FROM my_table WHERE my_value=true or my_value IS NULL;
OK
182285
Time taken: 13.406 seconds

      

I expect the last query to return 189879

(i.e. 182285 + 7594). Why is it wrong?

For the curious reader, this seems to give the correct result:

shark> SELECT COUNT(*) FROM my_table WHERE isnull(my_value) or my_value=true;
OK
189879

      

Also, this is not an operator precedence issue:

shark> SELECT COUNT(*) FROM my_table WHERE (my_value=true) or (my_value IS NULL);
OK
182285

      

Update: It looks like the statement IS

in the sentence is WHERE

not doing what I expect:

shark> SELECT my_value IS NULL FROM my_table WHERE my_value IS NULL LIMIT 10;
14/11/26 11:34:52 WARN parse.ASTRewriteUtil: Query contains a LIMIT. Skipping applicable COUNT DISTINCT rewrites.A LIMIT shouldn't be paired with an aggregation that only returns one line ...
OK
false
false
false
false
false
false
false
false
false
false

      

It is even more surprising that it SELECT COUNT(*) FROM my_table WHERE my_value IS NULL;

returned the correct result.

+3


source to share





All Articles