Add mean to mean

I have a query that adds an average to an existing average, I copied the number of rows inserted for example, but we usually do 5000 rows at the same time many times a day.

INSERT INTO stats (impcount, 
                   average_position, 
                   state, 
                   dir_id, 
                   viz_id, 
                   date_date, 
                   org_id, 
                   order_id) VALUES  
('2', '1', '', '5537', '22189', '2014-11-06', '-1', '15963'), 
('2', '2', '', '5624', '22020', '2014-11-06', '-1', '15963'), 
('2', '3', '', '5624', '18215', '2014-11-06', '-1', '15963'), 
('2', '4', '', '6153', '22071', '2014-11-06', '-1', '15963'), 
('1', '9', '', '5624', '21735', '2014-11-06', '-1', '15634')  
ON DUPLICATE KEY UPDATE impcount = impcount + VALUES(impcount),  
average_position = VALUES(average_position) + ((((average_position * impcount) 
+  ( VALUES(impcount) * VALUES(average_position)) ) / (impcount  + VALUES(impcount) ))
- average_position);

      

The table structure looks like this:

CREATE TABLE stats (
    record_id bigint(10) NOT NULL AUTO_INCREMENT,
    date_year varchar(4) DEFAULT NULL,
    date_month char(2) DEFAULT NULL,
    date_day char(2) DEFAULT NULL,
    impcount int(10) DEFAULT NULL,
    date_hour varchar(4) DEFAULT NULL,
    dir_id bigint(20) unsigned DEFAULT NULL,
    viz_id int(9) NOT NULL DEFAULT '0',
    order_id int(3) DEFAULT '0',
    date_date date NOT NULL DEFAULT '0000-00-00',
    average_position double DEFAULT NULL,
    state varchar(200) DEFAULT NULL,
    org_id int(10) unsigned DEFAULT NULL,
    PRIMARY KEY (record_id),
    UNIQUE KEY viz_id (viz_id,order_id,date_date,org_id),
    KEY viz_counts (viz_id,date_date,impcount,average_position,order_id),
    KEY daily_counts (date_date,impcount,order_id,average_position),
    KEY dir_counts (dir_id,date_date,order_id),
    KEY org_id (org_id)
) ENGINE=InnoDB AUTO_INCREMENT=33499742809 DEFAULT CHARSET=latin1;

      

Does anyone know how to simplify this, I don't really understand the math, but was hoping someone would recognize it and find out a better / easier / faster way to do the same.

impcount is the number of impressions for each line, and average_position is the direct average position of each impression. from what I understand it is a moving average where each of the duplicate key updates keeps the average precision.

+3


source to share


1 answer


Let A1 and A2 be two means with numbers N1 and N2, respectively. Then the combined mean

    N1 × A1 + N2 × A2
A = -----------------
         N1 + N2

      

with a score of N = N1 + N2.




Note that if you store your data as S1 and S2 and count N1 and N2, this is much easier to do. The combined sum is S1 + S2, the combined counter is N1 + N2, and the average is calculated as S / N.

0


source







All Articles