Save data both numeric and double

I have a table with a numeric field, say:

create table data (
  id bigserial PRIMARY KEY,
  quantity numeric(21,8) NOT NULL
)

      

I need a type numeric

because some queries need a level of precision that cannot be obtained from doubles.

But I also have queries that are millions of these values, don't care about rounding issues, and should be as fast as possible.

Is there a standard strategy for this or should I just duplicate every numeric number:

create table data (
  id bigserial PRIMARY KEY,
  quantity_exact numeric(21,8) NOT NULL,
  quantity double precision NOT NULL
)

      

+3


source to share


2 answers


Please read the updates below for a full view.

Let's compare both cases that you indicated:

  • use only one column (which one is float8

    or numeric(21,8)

    ?) or
  • keep both of them ( keep two

    ).

Some observations.

  • If both columns are preserved, we are talking about duplicate data, which contradicts normalization and introduces ambiguity into the system, which requires a special approach. This makes it -1

    for the occasion keep two

    .

  • Column size:

    SELECT 'float8'::coltyp, pg_column_size(random()::float8) UNION ALL
    SELECT 'numeric(21,8)',  pg_column_size(random()::numeric(21,8));
    
          

    Keeping both columns in this case will require almost twice as much space. Thus, from -1

    to keep two

    , as well as +0.5

    to the variant float8

    , since it is slightly smaller in size.

  • Speed ​​tests show the following:

    SET work_mem TO '2000MB'; -- to avoid usage of temp files
    EXPLAIN (analyze,buffers,verbose)
    SELECT ((random()*1234567)::float8 / 2 + 3) * 5
      FROM generate_series(1,(1e7)::int) s;
    EXPLAIN (analyze,buffers,verbose)
    SELECT ((random()*1234567)::numeric(21,8) / 2 + 3) * 5
      FROM generate_series(1,(1e7)::int) s;
    
          

    On my i7 2.3GHz MBP, I got (based on 5 runs):

    • no more 3135.238ms

      for float8

      and
    • no more 17325.514ms

      for numeric(21,8)

      .

    So, here we have a clear one +1

    for the case float8

    . This is a memory-only check and query against the table (and cold one) will take much longer.

It seems like sticking float8

is the obvious way to go ( +1.5

vs -2

) given your performance requirements. And you can create a view on top of this table that will advertise both original float8

and cast numeric(21,8)

to suit your exacting demands.


UPDATE: After commenting, a_horse_with_no_name

I decided to repeat the check, this time using real tables. I went for 9.4beta3 as 9.4 comes with a very nice pg_prewarm

module.

This is what I did:

export PGDATA=$HOME/9.4b3
initdb -k -E UTF8 -A peer
pg_ctl start

      



Then I changed some of the defaults using a new function ALTER SYSTEM

:

ALTER SYSTEM SET shared_buffers TO '1280MB';
ALTER SYSTEM SET checkpoint_segments TO '99';
ALTER SYSTEM SET checkpoint_completion_target TO '0.9';

      

Restart the server via pg_ctl restart

and now test:

SELECT id::int, 1::int AS const, (random()*1234567)::float8 as val
  INTO f FROM generate_series(1,(1e7)::int) id;
SELECT id::int, 1::int AS const, (random()*1234567)::numeric(21,8) as val
  INTO n FROM generate_series(1,(1e7)::int) id;
CREATE EXTENSION pg_prewarm;
VACUUM ANALYZE;
SELECT pg_prewarm('f');
SELECT pg_prewarm('n');
-- checking table size
SELECT relname,pg_size_pretty(pg_total_relation_size(oid))
  FROM pg_class WHERE relname IN ('f','n');
-- checking sped
EXPLAIN (analyze, buffers, verbose) SELECT min(id), max(id), sum(val) FROM f;
EXPLAIN (analyze, buffers, verbose) SELECT min(id), max(id), sum(val) FROM n;

      

The results are now completely different:

  • size 422 MB

    vs498 MB

  • average time for float8

    -2272.833ms

  • and for numeric(21,8)

    this3289.542ms

Now this doesn't exactly reflect the real situation, but in my opinion:

  • using numeric

    will add something (for me it is 20%) to the size of the relationship;
  • makes queries somewhat slow (44% for me).

I was very surprised by these numbers, to be honest. Both tables are fully cached, so time was wasted only on tuple processing and math. Me though it will make a big difference.

Personally, I would go for numeric

now, given the not very high difference in performance and data accuracy that it offers.

+2


source


I add my 5 cents. PG 9.4

CREATE TABLE orders( 
count integer not null
...
cost character varying(15) -- cost as string '10.22' for example
ncost numeric(10,2) -- same cost as numeric 10.22
)

      

~ 260,000 lines:

explain analysis



select sum(count*ncost) from orders 

      

"Total execution time: 743.259 ms" (hot data after 10 tests) explain the analysis

select sum(count*cost::numeric(10,2)) from orders 

      

"Total execution time: 577.289 ms" So, keep the cost as a string faster for sum ().

+1


source







All Articles