Postgres index-only-scan: is it possible to ignore visibility map or avoid heap fetch?

Sorry, a lot of context in front of the real question as we have researched this thoroughly and I wanted to give you the full context.

Some contexts : postgres index-only-scans rely on a visibility map (VM). If a page is not marked as not fully visible in the visibility map, postgres fetches that page to ensure that the data is visible to this transaction, even when only an index scan is performed. Unfortunately, this can significantly slow down the index only scans. The index can return results of 10k rows, but the index itself only spans 50 pages (very fast in terms of I / O). However, if the VM is not specified, it also does an additional 10k heap fetch (200x slower in terms of IO).

Details : https://wiki.postgresql.org/wiki/Index-only_scans#The_Visibility_Map_.28and_other_relation_forks.29

Try it yourself : EXPLAIN ANALYZE query only index, before and after VACUUM. You can see the number of heap fetches drops after VACUUM (if you had some dirty pages in the VM before)

Already tried it : we have already set up autovacuum and we vacuum regularly. It helps a lot, but we would like to get it even faster.

Question (finally) : is it possible to skip heaps when doing an index-only scan? I know we wouldn't have perfect MVCC reading, but we're fine with that. The data in the index is close enough and it's definitely not worth the overhead of thousands of heap samples to make sure we're not looking at slightly outdated data. To borrow a term from NoSQL, we'd be good at reading "possible sequence".

Thank!

+3


source to share


1 answer


There is no way in PostgreSQL to do what you want. It would be interesting to do, but doing a little work, very unlikely to be accepted into the kernel, extremely difficult to do with an extension, and probably have worse side effects than you probably expect.

Basically you should add an isolation level DIRTY READ

to PostgreSQL, except that it will be even weaker because it can also return deleted data , old versions of updated rows, and multiple values โ€‹โ€‹from unique indices . This latter problem would potentially be very upsetting to the query planner, as it would suggest that the results from unique indexes would be unique.

I see that the likelihood of such a change is taken as very close to zero in the PostgreSQL core. The possible use cases are very limited.



The only way I could justify adding a feature like this is to make it easier to recover from loss / corruption on crashes and accidental deletions by maintaining raw reads. This will make sense for heap seqscans, not just indexing.

It would probably be wiser to tackle this other way, like an insecure caching layer (Redis, etc.) on top of the DB for data you don't need completely fresh.

+2


source







All Articles