How can I improve the performance of queries for large amounts of data in a PostgreSQL database?

Question

How can I improve the performance of queries for large amounts of data in a PostgreSQL database?

I have a PostgreSQL database with 1.2 billion rows, trying to make an application that queries a million rows at a time, with the ability to query for large intervals. At first I just simply queried the database for between a million and 10 million;
Now when I query a large database with OFFSET

, it ResultSet

takes a long time to generate.

   // ...
   stmt.setFetchSize(100000);
   ResultSet rs = stmt.executeQuery("SELECT mmsi, report_timestamp, position_geom, ST_X(position_geom) AS Long, "
                        + "ST_Y(position_geom) AS Lat FROM reports4 WHERE position_geom IS NOT NULL ORDER by report_timestamp ASC LIMIT "
                        + limit + " OFFSET " + set);

So it ORDER BY

probably kills my runtime, but with ordered information makes things easier later on. Is there a more efficient way to query rows in gaps?

+3

java postgresql jdbc postgresql-performance

guy_sensei Jul 20. 15 at 12:26

source to share

2 answers

You can use an incomplete index that is built over a subset of your database.

CREATE INDEX idx_reports4 ON reports4(position_geom, report_timestamp) where position_geom IS NOT NULL;

This will greatly increase performance as you are simply indexing a portion of the database you want.

+2

Constantine 21 jul. 15 at 5:40 am

source to share

Gordon linoff · Accepted Answer · 2015-07-20T12:33:31+0000

For this request:

SELECT mmsi, report_timestamp, position_geom, ST_X(position_geom) AS Long, "
                        + "ST_Y(position_geom) AS Lat
FROM reports4
WHERE position_geom IS NOT NULL
ORDER by report_timestamp ASC;

You should be able to use an index for the expression:

CREATE INDEX idx_reports4_position_ts ON reports4((position_geom IS NOT NULL), report_timestamp)

This index should be used directly for the query.

How can I improve the performance of queries for large amounts of data in a PostgreSQL database?

More articles: