Postgresql - rows deleted by index

The table in question contains a B-tree index on time

testdb=> explain analyze select avg(gl) from cdstest where time between 1407700790 and 1407711590;
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1434716.75..1434716.76 rows=1 width=2) (actual time=20106.951..20106.952 rows=1 loops=1)
   ->  Bitmap Heap Scan on cdstest  (cost=231261.49..1411280.42 rows=9374529 width=2) (actual time=811.495..10871.963 rows=9438824 loops=1)
         Recheck Cond: (("time" >= 1407700790) AND ("time" <= 1407711590))
         Rows Removed by Index Recheck: 204734
         ->  Bitmap Index Scan on timeindex  (cost=0.00..228917.86 rows=9374529 width=0) (actual time=810.108..810.108 rows=9438824 loops=1)
               Index Cond: (("time" >= 1407700790) AND ("time" <= 1407711590))
 Total runtime: 20107.001 ms
(7 rows)

      

Rows Removed by Index Recheck: 204734

- What does it mean? It looks like a pretty arbitrary number.

Number of lines between a given time range:

testdb=> select count(*) from cdstest where time between 1407700790 and 1407711590;
  count  
---------
 9438824
(1 row)

      

The table contains ~ 60 million rows.

+3


source to share


1 answer


The inner Bitmap Index Scan

node creates a bitmap by putting 1

in all the places where records are found matching your search key and 0

otherwise. Since your table is quite large, the size of the bitmap increases, and then the available memory for these operations, configured with work_mem

, becomes small to save the entire bitmap.

In the absence of memory, the internal node will start producing 1

not for records, but rather for blocks known to contain corresponding records. This means that the outer node Bitmap Heap Scan

must read all records from such a block and recheck them. It's a shame, there will be some inappropriate, and their number is what you see as Rows Removed by Index Recheck

.

In 9.4 coming soon, a new function has been added that tells how many pages exact

and / or lossy

where the Bitmap Index Scan

node is returned . lossy

- those that you would like to avoid. You can find out about it here .

Finally, consult the parameter work_mem

and try increasing it, just for that particular session. I guess a 40% increase should be sufficient.


EDIT
I have 9.4beta3 working here, so I prepared a small case with an image:

DROP TABLE IF EXISTS tab;
SELECT id, id%10 mod
  INTO tab
  FROM generate_series(1,(1e7)::int) id;
CREATE INDEX i_tab_mod ON tab(mod);
VACUUM ANALYZE tab;

      

Now I have set work_mem

to the lowest possible value and tested it:



SET work_mem TO '64kB';
EXPLAIN (analyze, buffers)
SELECT * FROM tab WHERE mod=5;

      

EXPLAIN

provides the following 2 lines:

  Rows Removed by Index Recheck: 8896308
  Heap Blocks: exact=510 lossy=43738
  ...
Execution time: 1356.938 ms

      

This means it 64kB

can hold 510 precise blocks. So I calculate the total memory requirement here:

new_mem_in_bytes = (work_mem_in_bytes / exact) * lossy
                 = (( 64.0 * 1024 / 510 ) * 43738) / 1024
                 = 5488.7kB

      

This is not an accurate approach to calculating the required memory, actually, but I think it is good enough for our needs. So I tried with SET work_mem TO '5MB'

:

  Heap Blocks: exact=44248
  ...
Execution time: 283.466 ms

      

+8


source







All Articles