Add mean to mean
I have a query that adds an average to an existing average, I copied the number of rows inserted for example, but we usually do 5000 rows at the same time many times a day.
INSERT INTO stats (impcount,
average_position,
state,
dir_id,
viz_id,
date_date,
org_id,
order_id) VALUES
('2', '1', '', '5537', '22189', '2014-11-06', '-1', '15963'),
('2', '2', '', '5624', '22020', '2014-11-06', '-1', '15963'),
('2', '3', '', '5624', '18215', '2014-11-06', '-1', '15963'),
('2', '4', '', '6153', '22071', '2014-11-06', '-1', '15963'),
('1', '9', '', '5624', '21735', '2014-11-06', '-1', '15634')
ON DUPLICATE KEY UPDATE impcount = impcount + VALUES(impcount),
average_position = VALUES(average_position) + ((((average_position * impcount)
+ ( VALUES(impcount) * VALUES(average_position)) ) / (impcount + VALUES(impcount) ))
- average_position);
The table structure looks like this:
CREATE TABLE stats (
record_id bigint(10) NOT NULL AUTO_INCREMENT,
date_year varchar(4) DEFAULT NULL,
date_month char(2) DEFAULT NULL,
date_day char(2) DEFAULT NULL,
impcount int(10) DEFAULT NULL,
date_hour varchar(4) DEFAULT NULL,
dir_id bigint(20) unsigned DEFAULT NULL,
viz_id int(9) NOT NULL DEFAULT '0',
order_id int(3) DEFAULT '0',
date_date date NOT NULL DEFAULT '0000-00-00',
average_position double DEFAULT NULL,
state varchar(200) DEFAULT NULL,
org_id int(10) unsigned DEFAULT NULL,
PRIMARY KEY (record_id),
UNIQUE KEY viz_id (viz_id,order_id,date_date,org_id),
KEY viz_counts (viz_id,date_date,impcount,average_position,order_id),
KEY daily_counts (date_date,impcount,order_id,average_position),
KEY dir_counts (dir_id,date_date,order_id),
KEY org_id (org_id)
) ENGINE=InnoDB AUTO_INCREMENT=33499742809 DEFAULT CHARSET=latin1;
Does anyone know how to simplify this, I don't really understand the math, but was hoping someone would recognize it and find out a better / easier / faster way to do the same.
impcount is the number of impressions for each line, and average_position is the direct average position of each impression. from what I understand it is a moving average where each of the duplicate key updates keeps the average precision.
source to share
Let A1 and A2 be two means with numbers N1 and N2, respectively. Then the combined mean
N1 × A1 + N2 × A2
A = -----------------
N1 + N2
with a score of N = N1 + N2.
Note that if you store your data as S1 and S2 and count N1 and N2, this is much easier to do. The combined sum is S1 + S2, the combined counter is N1 + N2, and the average is calculated as S / N.
source to share