Mysql update using self-regulation query
I have a survey table that contains (among others) the following columns
survey_id - unique id user_id - the id of the person the survey relates to created - datetime ip_address - of the submission ip_count - the number of duplicates
Due to the large set of records, it is impractical to run this query "on the fly", so it tries to create an update statement that will periodically store the "cached" result in ip_count.
The purpose of ip_count is to show that the number of duplicate survey requests ip_address was received in the same user_id with a 12 month period (+/- 6 months of the generated date).
Using the following dataset, this is the expected result.
survey_id user_id created ip_address ip_count #counted duplicates survey_id 1 1 01-Jan-12 123.132.123 1 # 2 2 1 01-Apr-12 123.132.123 2 # 1, 3 3 2 01-Jul-12 123.132.123 0 # 4 1 01-Aug-12 123.132.123 3 # 2, 6 6 1 01-Dec-12 123.132.123 1 # 4
This is the closest solution I have come up with so far, but this query ignores the date constraint and tries to come up with an alternative method.
UPDATE surveys JOIN( SELECT ip_address, created, user_id, COUNT(*) AS total FROM surveys WHERE surveys.state IN (1, 3) # survey is marked as completed and confirmed GROUP BY ip_address, user_id ) AS ipCount ON ( ipCount.ip_address = surveys.ip_address AND ipCount.user_id = surveys.user_id AND ipCount.created BETWEEN (surveys.created - INTERVAL 6 MONTH) AND (surveys.created + INTERVAL 6 MONTH) ) SET surveys.ip_count = ipCount.total - 1 # minus 1 as this query will match on its own id. WHERE surveys.ip_address IS NOT NULL # ignore surveys where we have no ip_address
Thank you for your help :)
source to share
I don't have your table with me, so it's hard for me to create correct SQL that definitely works, but I can do it, and hopefully can help you.
First I will need to take the Cartesian product of the polls against myself and filter out the rows that I don't want
select s1.survey_id x, s2.survey_id y from surveys s1, surveys s2 where s1.survey_id != s2.survey_id and s1.ip_address = s2.ip_address and (s1.created and s2.created fall 6 months within each other)
The result of this should contain every pair of polls that match (according to your rules) TWICE (once for each ID in the 1st position and once for each ID in the 1st position and once for it to be in the 2nd position)
We can then do
on the output of this to get a table that basically gives me the correct ip_count for each survey_id
(select x, count(*) c from (select s1.survey_id x, s2.survey_id y from surveys s1, surveys s2 where s1.survey_id != s2.survey_id and s1.ip_address = s2.ip_address and (s1.created and s2.created fall 6 months within each other)) group by x)
So now we have a table mapping each survey_id to its correct ip_count. To update the original table, we need to join it and copy the values above
So it should look something like this:
UPDATE surveys SET s.ip_count = n.c from surveys s inner join (ABOVE QUERY) n on s.survey_id = n.x
There is some pseudo code, but I think the general idea should work
I've never had to update a table based on the output of another query before. Tried to guess the correct syntax for this from this question - How do I update a SELECT in SQL Server?
Also, if I needed to do something like this for my own work, I would not try to do it in one request .. That would be a pain to maintain and may have memory / performance issues. It would be better if the script would traverse the table row by row, update one transaction in a transaction, before moving on to the next row. Much slower, but easier to understand and arguably easier on your database.
source to share
Several (very) minor tweaks to the one shown above. Thanks again!
UPDATE surveys AS s INNER JOIN ( SELECT x, count(*) c FROM ( SELECT s1.id AS x, s2.id AS y FROM surveys AS s1, surveys AS s2 WHERE s1.state IN (1, 3) # completed and verified AND s1.id != s2.id # dont self join AND s1.ip_address != "" AND s1.ip_address IS NOT NULL # not interested in blank entries AND s1.ip_address = s2.ip_address AND (s2.created BETWEEN (s1.created - INTERVAL 6 MONTH) AND (s1.created + INTERVAL 6 MONTH)) AND s1.user_id = s2.user_id # where completed for the same user ) AS ipCount GROUP BY x ) n on s.id = n.x SET s.ip_count = n.c
source to share