Why isn't this query using the index?

Question

Why isn't this query using the index?

I ran into strange behavior from the Postgres optimizer on the following query:

select count(product0_.id) as col_0_0_ from Product product0_ 
 where product0_.active=true 
 and (product0_.aggregatorId is null 
 or product0_.aggregatorId in ($1 , $2 , $3))

Product

has about 54 columns, active

is a btree boolean, and aggregatorId

is "varchar (15)" and has a btree index.

In this question, above the index for 'aggregatorId' is not used:

Aggregate  (cost=169995.75..169995.76 rows=1 width=32) (actual time=3904.726..3904.727 rows=1 loops=1)
  ->  Seq Scan on product product0_  (cost=0.00..165510.39 rows=1794146 width=32) (actual time=0.055..2407.195 rows=1851827 loops=1)
        Filter: (active AND ((aggregatorid IS NULL) OR ((aggregatorid)::text = ANY ('{5109037,5001015,70601}'::text[]))))
        Rows Removed by Filter: 542146
Total runtime: 3904.925 ms

But if we shrink the query, leaving a null check for that column, the index will be used:

Aggregate  (cost=17600.93..17600.94 rows=1 width=32) (actual time=614.933..614.935 rows=1 loops=1)
  ->  Index Scan using idx_prod_aggr on product product0_  (cost=0.43..17487.56 rows=45347 width=32) (actual time=19.284..594.509 rows=12099 loops=1)
      Index Cond: ((aggregatorid)::text = ANY ('{5109037,5001015,70601}'::text[]))
      Filter: active
    Rows Removed by Filter: 49130
Total runtime: 150.255 ms

As far as I know, the btree index can handle null checks, so I don't understand why the index is not being used for a full query. The product table contains about 2.3 million entries, so it's not very fast.

EDIT: The index is very standard:

CREATE INDEX idx_prod_aggr
  ON product
  USING btree
  (aggregatorid COLLATE pg_catalog."default");

+3

sql postgresql

Uwe allner June 11. '15 at 9:12

source to share

2 answers

Your problem looked interesting, so I reproduced your scenario - postgres 9.1, table with 1M rows, one boolean column, one varchar column, indexed, half of the table has NULL names.

I had the same analysis parsing the output when the varchar column was not indexed. However, with postgres, index uses a NULL state raster scan and an IN clause and then combines them with an OR clause.

Then it uses seq check in boolean state (since the indices are split)

explain analyze
select * from A where active is true and ((name is null) OR (name in ('1','2','3')  ));

See the output:

"Bitmap Heap Scan on a  (cost=17.34..21.35 rows=1 width=18) (actual time=0.048..0.048 rows=0 loops=1)"
"  Recheck Cond: ((name IS NULL) OR ((name)::text = ANY ('{1,2,3}'::text[])))"
"  Filter: (active IS TRUE)"
"  ->  BitmapOr  (cost=17.34..17.34 rows=1 width=0) (actual time=0.047..0.047 rows=0 loops=1)"
"        ->  Bitmap Index Scan on idx_prod_aggr  (cost=0.00..4.41 rows=1 width=0) (actual time=0.010..0.010 rows=0 loops=1)"
"              Index Cond: (name IS NULL)"
"        ->  Bitmap Index Scan on idx_prod_aggr  (cost=0.00..12.93 rows=1 width=0) (actual time=0.036..0.036 rows=0 loops=1)"
"              Index Cond: ((name)::text = ANY ('{1,2,3}'::text[]))"
"Total runtime: 0.077 ms"

This makes me think that you missed some details, if so please add them to your question.

+1

AdamSkywalker June 11. 15 at 10:07

source to share

Dragan bozanovic · Accepted Answer · 2015-06-11T10:20:59+0000

Since there are many of the same values for the column you are using in the where clause (78% of all table rows according to your numbers), the database will conclude that it is cheaper to use a full table scan than to discard the extra time to read the index.

The rule of thumb for most database vendors is that an index will probably not be used unless it can narrow the search down to about 5% of all records in the table.

Why isn't this query using the index?

More articles: