Postgres is using the wrong index

Question

Postgres is using the wrong index

I have a request:

EXPLAIN ANALYZE
SELECT CAST(DATE(associationtime) AS text) AS date ,
       cast(SUM(extract(epoch
                        FROM disassociationtime) - extract(epoch
                                                           FROM associationtime)) AS bigint) AS sessionduration,
       cast(SUM(tx) AS bigint)AS tx,
       cast(SUM(rx) AS bigint) AS rx,
       cast(SUM(dataRetries) AS bigint) AS DATA,
       cast(SUM(rtsRetries) AS bigint) AS rts,
       count(*)
FROM SESSION
WHERE ssid_id=42
  AND ap_id=1731
  AND DATE(associationtime)>=DATE('Tue Nov 04 00:00:00 MSK 2014')
  AND DATE(associationtime)<=DATE('Thu Nov 20 00:00:00 MSK 2014')
GROUP BY(DATE(associationtime))
ORDER BY DATE(associationtime);

Output:

 GroupAggregate  (cost=0.44..17710.66 rows=1 width=32) (actual time=4.501..78.880 rows=17 loops=1)
   ->  Index Scan using session_lim_values_idx on session  (cost=0.44..17538.94 rows=6868 width=32) (actual time=0.074..73.266 rows=7869 loops=1)
         Index Cond: ((date(associationtime) >= '2014-11-04'::date) AND (date(associationtime) <= '2014-11-20'::date))
         Filter: ((ssid_id = 42) AND (ap_id = 1731))
         Rows Removed by Filter: 297425
 Total runtime: 78.932 ms

Look at this line:

Index Scan using session_lim_values_idx

As you can see, the request uses three fields for scanning: ssid_id, ap_id, and association time. I have an index for this:

ssid_pkey                  | btree | {id}
ap_pkey                    | btree | {id}
testingshit_pkey           | btree | {one,two,three}
session_date_ssid_idx      | btree | {ssid_id,date(associationtime),"date_trunc('hour'::text, associationtime)"}
session_pkey               | btree | {associationtime,disassociationtime,sessionduration,clientip,clientmac,devicename,tx,rx,protocol,snr,rssi,dataretries,rtsretries }
session_main_idx           | btree | {ssid_id,ap_id,associationtime,disassociationtime,sessionduration,clientip,clientmac,devicename,tx,rx,protocol,snr,rssi,dataretres,rtsretries}
session_date_idx           | btree | {date(associationtime),"date_trunc('hour'::text, associationtime)"}
session_date_apid_idx      | btree | {ap_id,date(associationtime),"date_trunc('hour'::text, associationtime)"}
session_date_ssid_apid_idx | btree | {ssid_id,ap_id,date(associationtime),"date_trunc('hour'::text, associationtime)"}
ap_apname_idx              | btree | {apname}
users_pkey                 | btree | {username}
user_roles_pkey            | btree | {user_role_id}
session_lim_values_idx     | btree | {date(associationtime)}

It's called session_date_ssid_apid_idx

. But why is the query using the wrong index?

session_date_ssid_apid_idx:

------------+-----------------------------+-------------------------------------------
 ssid_id    | integer                     | ssid_id
 ap_id      | integer                     | ap_id
 date       | date                        | date(associationtime)
 date_trunc | timestamp without time zone | date_trunc('hour'::text, associationtime)

session_lim_values_idx:

date    | date | date(associationtime)

What kind of index would you create?

UPD: \d session

 --------------------+-----------------------------+------------------------------------------------------
 id                 | integer                     | NOT NULL DEFAULT nextval('session_id_seq'::regclass)
 ssid_id            | integer                     | NOT NULL
 ap_id              | integer                     | NOT NULL
 associationtime    | timestamp without time zone | NOT NULL
 disassociationtime | timestamp without time zone | NOT NULL
 sessionduration    | character varying(100)      | NOT NULL
 clientip           | character varying(100)      | NOT NULL
 clientmac          | character varying(100)      | NOT NULL
 devicename         | character varying(100)      | NOT NULL
 tx                 | integer                     | NOT NULL
 rx                 | integer                     | NOT NULL
 protocol           | character varying(100)      | NOT NULL
 snr                | integer                     | NOT NULL
 rssi               | integer                     | NOT NULL
 dataretries        | integer                     | NOT NULL
 rtsretries         | integer                     | NOT NULL
╚√:
    "session_pkey" PRIMARY KEY, btree (associationtime, disassociationtime, sessionduration, clientip, clientmac, devicename, tx, rx, protocol, snr, rssi, dataretries, rtsretries)
    "session_date_ap_ssid_idx" btree (ssid_id, ap_id, associationtime)
    "session_date_apid_idx" btree (ap_id, date(associationtime), date_trunc('hour'::text, associationtime))
    "session_date_idx" btree (date(associationtime), date_trunc('hour'::text, associationtime))
    "session_date_ssid_apid_idx" btree (ssid_id, ap_id, associationtime)
    "session_date_ssid_idx" btree (ssid_id, date(associationtime), date_trunc('hour'::text, associationtime))
    "session_lim_values_idx" btree (date(associationtime))
    "session_main_idx" btree (ssid_id, ap_id, associationtime, disassociationtime, sessionduration, clientip, clientmac, devicename, tx, rx, protocol, snr, rssi, dataretries, rtsretries)

+3

indexing postgresql postgresql-performance sql-execution-plan

Tony 20 nov. 14 at 10:37

source to share

1 answer

Erwin Brandstetter · Accepted Answer · 2014-11-20T12:47:41+0000

Very general values in predicates for ssid_id

and ap_id

might make it cheaper for Postgres to pick a smaller index session_lim_values_idx

(only 1 date

) compared to the seemingly better fit, but a larger index session_date_ssid_apid_idx

(4 columns) and filter out the rest.

In your case, about 4% of the lines have ssid_id=42 AND ap_id=1731

. Typically this should not result in a switch to a lower index. But there are several other factors at play that can tilt the scale, mainly cost parameters and statistics . Details:

Keep PostgreSQL from choosing a bad query plan

What to do?

Adjust your cost parameters if you have not already done so following the guidelines in linked to the answer above .
Increase the statistics target for the columns involved ssid_id

, ap_id

and run ANALYZE

:
- Check statistics targets in PostgreSQL
One special factor here: Postgres collects separate statistics for expressions in indexes . Check:
```
SELECT * FROM pg_statistic
WHERE starelid = 'session_date_ssid_apid_idx'::regclass;

      

        
        
        
      

    
```
You will find the highlighted line for the expression date(associationtime)

. More details:
- An index that is not used affects the query
Make the index session_date_ssid_apid_idx

more attractive (smaller) by removing the 4th column "date_trunc('hour'::text, associationtime)

. When you look at your later added table definition, you have already done so.
I would prefer to use the standard syntax for translations: cast(associationtime AS date)

instead of function syntax date(associationtime)

. Not to mention what's important, I just know how to work it right. You can use shorthand syntax associationtime::date

in your queries that is compatible with the expression index, but using the verbose form in the index definition.

Also, check with EXPLAIN ANALYZE

which query plan is actually faster by deleting / recreating only the index you want to check. You will then see if all Postgres have chosen the best plan.

You have quite a few indexes, I would check if they are all in use and get rid of the rest. Indexes have a maintenance cost and it is generally beneficial to focus on fewer indexes if possible (easier to insert into the cache and can be cached if needed). Weighing costs and benefits.

Besides

I would use:

SUM(extract(epoch FROM disassociationtime
                     - associationtime)::int) AS sessionduration

Postgres is using the wrong index

What to do?

Besides

More articles: