JSONB performance degrades as the number of keys increases

Question

JSONB performance degrades as the number of keys increases

I am testing the performance of jsonb type in postgresql. Each document will contain about 1500 keys that are not hierarchical. The document is flattened. This is what the table and document looks like.

create table ztable0
(
   id serial primary key,
   data jsonb
)

Here's an example doc:

{ "0": 301, "90": 23, "61": 4001, "11": 929} ...

As you can see, the document contains no hierarchies and all values are integers. However, some of them will be in the future.

Lines: 86,000
Columns: 2
Keys in document: 1500+

When looking for a specific key value or running a group, performance is noticeably slow. This query:

select (data ->> '1')::integer, count(*) from ztable0
group by (data ->> '1')::integer
limit 100

it took about 2 seconds. Is there any way to improve the performance of jsonb documents.

+3

json sql postgresql jsonb

Luke101 08 oct. 14 at 14:25

source to share

1 answer

vyegorov · Accepted Answer · 2014-10-08T15:11:14+0000

This is a known issue in 9.4beta2

, please have a look at this blog post , it has some details and pointers to mail.

About this problem.

PostgreSQL uses TOAST to store data values, which means that large values (usually rounded 2 KB or more) are stored in a separate special table view. And PostgreSQL also tries to compress the data using a method pglz

(been there for ages). By "try" it means that the first 1 k bytes are probed before deciding to compress the data. And if the results are not satisfactory, that is, compression does not provide any benefit to the sounded data, the decision is made not to compress.

So the original JSONB format kept the offset table at the beginning of this value. And for values with a large number of root keys in JSON, this led to the first 1kB (or more) being occupied by offsets. It was a series of separate data, i.e. It was impossible to find two adjacent 4-byte sequences that were equal. Thus, there is no compression.

Note that if you move the offset table, the rest of the value shrinks just fine. Thus, one option would be to specify the code pglz

explicitly applicable compression and where to look for it (especially for newly introduced data types), but the existing infrastructure does not support this.

Correction

The decision was made to change the way JSONB value stores data, making it more suitable for compression pglz

. Here's a commit post from Tom Lane with a change that implements the new JSONB format on disk. And despite the format changes, finding a random element is still O (1).

However, it took about a month. It 9.4beta3

is already flagged as I see it, so you can check it again shortly after the official announcement.

Important note: you will need to do pg_dump

/ pg_restore

exercise or use a tool pg_upgrade

to switch to 9.4beta3

, as fixing the problem you identified requires a change in the way the data is stored, so is beta3

not binary compatible with beta2

.

JSONB performance degrades as the number of keys increases

About this problem.

Correction

More articles: