Save data both numeric and double

Question

Save data both numeric and double

I have a table with a numeric field, say:

create table data (
  id bigserial PRIMARY KEY,
  quantity numeric(21,8) NOT NULL
)

I need a type numeric

because some queries need a level of precision that cannot be obtained from doubles.

But I also have queries that are millions of these values, don't care about rounding issues, and should be as fast as possible.

Is there a standard strategy for this or should I just duplicate every numeric number:

create table data (
  id bigserial PRIMARY KEY,
  quantity_exact numeric(21,8) NOT NULL,
  quantity double precision NOT NULL
)

+3

performance sql postgresql numeric

assylias 15 oct. 14 at 17:20

source to share

2 answers

I add my 5 cents. PG 9.4

CREATE TABLE orders( 
count integer not null
...
cost character varying(15) -- cost as string '10.22' for example
ncost numeric(10,2) -- same cost as numeric 10.22
)

~ 260,000 lines:

explain analysis

select sum(count*ncost) from orders

"Total execution time: 743.259 ms" (hot data after 10 tests) explain the analysis

select sum(count*cost::numeric(10,2)) from orders

"Total execution time: 577.289 ms" So, keep the cost as a string faster for sum ().

+1

nickyat June 13. 15 at 18:59

source to share

vyegorov · Accepted Answer · 2014-10-15T19:13:39+0000

_{Please read the updates below for a full view.}

Let's compare both cases that you indicated:

use only one column (which one is float8

or numeric(21,8)

?) or
keep both of them ( keep two

).

Some observations.

If both columns are preserved, we are talking about duplicate data, which contradicts normalization and introduces ambiguity into the system, which requires a special approach. This makes it -1

for the occasion keep two

.
Column size:
```
SELECT 'float8'::coltyp, pg_column_size(random()::float8) UNION ALL
SELECT 'numeric(21,8)',  pg_column_size(random()::numeric(21,8));

      

        
        
        
      

    
```
Keeping both columns in this case will require almost twice as much space. Thus, from -1

to keep two

, as well as +0.5

to the variant float8

, since it is slightly smaller in size.
Speed tests show the following:
```
SET work_mem TO '2000MB'; -- to avoid usage of temp files
EXPLAIN (analyze,buffers,verbose)
SELECT ((random()*1234567)::float8 / 2 + 3) * 5
  FROM generate_series(1,(1e7)::int) s;
EXPLAIN (analyze,buffers,verbose)
SELECT ((random()*1234567)::numeric(21,8) / 2 + 3) * 5
  FROM generate_series(1,(1e7)::int) s;

      

        
        
        
      

    
```
On my i7 2.3GHz MBP, I got (based on 5 runs):
- no more 3135.238ms
  
  for float8
  
  and
- no more 17325.514ms
  
  for numeric(21,8)
  
  .
So, here we have a clear one +1

for the case float8

. This is a memory-only check and query against the table (and cold one) will take much longer.

It seems like sticking float8

is the obvious way to go ( +1.5

vs -2

) given your performance requirements. And you can create a view on top of this table that will advertise both original float8

and cast numeric(21,8)

to suit your exacting demands.

UPDATE: After commenting, a_horse_with_no_name

I decided to repeat the check, this time using real tables. I went for 9.4beta3 as 9.4 comes with a very nice pg_prewarm

module.

This is what I did:

export PGDATA=$HOME/9.4b3
initdb -k -E UTF8 -A peer
pg_ctl start

Then I changed some of the defaults using a new function ALTER SYSTEM

:

ALTER SYSTEM SET shared_buffers TO '1280MB';
ALTER SYSTEM SET checkpoint_segments TO '99';
ALTER SYSTEM SET checkpoint_completion_target TO '0.9';

Restart the server via pg_ctl restart

and now test:

SELECT id::int, 1::int AS const, (random()*1234567)::float8 as val
  INTO f FROM generate_series(1,(1e7)::int) id;
SELECT id::int, 1::int AS const, (random()*1234567)::numeric(21,8) as val
  INTO n FROM generate_series(1,(1e7)::int) id;
CREATE EXTENSION pg_prewarm;
VACUUM ANALYZE;
SELECT pg_prewarm('f');
SELECT pg_prewarm('n');
-- checking table size
SELECT relname,pg_size_pretty(pg_total_relation_size(oid))
  FROM pg_class WHERE relname IN ('f','n');
-- checking sped
EXPLAIN (analyze, buffers, verbose) SELECT min(id), max(id), sum(val) FROM f;
EXPLAIN (analyze, buffers, verbose) SELECT min(id), max(id), sum(val) FROM n;

The results are now completely different:

size 422 MB

vs498 MB
average time for float8

-2272.833ms
and for numeric(21,8)

this3289.542ms

Now this doesn't exactly reflect the real situation, but in my opinion:

using numeric

will add something (for me it is 20%) to the size of the relationship;
makes queries somewhat slow (44% for me).

I was very surprised by these numbers, to be honest. Both tables are fully cached, so time was wasted only on tuple processing and math. Me though it will make a big difference.

Personally, I would go for numeric

now, given the not very high difference in performance and data accuracy that it offers.

Save data both numeric and double

More articles: