Mysql update using self-regulation query

I have a survey table that contains (among others) the following columns

survey_id  - unique id
user_id    - the id of the person the survey relates to
created    - datetime
ip_address - of the submission
ip_count   - the number of duplicates

      

Due to the large set of records, it is impractical to run this query "on the fly", so it tries to create an update statement that will periodically store the "cached" result in ip_count.

The purpose of ip_count is to show that the number of duplicate survey requests ip_address was received in the same user_id with a 12 month period (+/- 6 months of the generated date).

Using the following dataset, this is the expected result.

survey_id   user_id    created    ip_address     ip_count  #counted duplicates survey_id
  1            1      01-Jan-12   123.132.123       1      # 2
  2            1      01-Apr-12   123.132.123       2      # 1, 3
  3            2      01-Jul-12   123.132.123       0      # 
  4            1      01-Aug-12   123.132.123       3      # 2, 6
  6            1      01-Dec-12   123.132.123       1      # 4

      

This is the closest solution I have come up with so far, but this query ignores the date constraint and tries to come up with an alternative method.

UPDATE surveys
JOIN(
  SELECT ip_address, created, user_id, COUNT(*) AS total
  FROM surveys  
  WHERE surveys.state IN (1, 3) # survey is marked as completed and confirmed
  GROUP BY ip_address, user_id
) AS ipCount 
  ON (
    ipCount.ip_address = surveys.ip_address
    AND ipCount.user_id = surveys.user_id
    AND ipCount.created BETWEEN (surveys.created - INTERVAL 6 MONTH) AND (surveys.created + INTERVAL 6 MONTH)
  )
SET surveys.ip_count = ipCount.total - 1 # minus 1 as this query will match on its own id.
WHERE surveys.ip_address IS NOT NULL # ignore surveys where we have no ip_address

      

Thank you for your help :)

+3


source to share


2 answers


I don't have your table with me, so it's hard for me to create correct SQL that definitely works, but I can do it, and hopefully can help you.

First I will need to take the Cartesian product of the polls against myself and filter out the rows that I don't want

select s1.survey_id x, s2.survey_id y from surveys s1, surveys s2 where s1.survey_id != s2.survey_id and s1.ip_address = s2.ip_address and (s1.created and s2.created fall 6 months within each other)

      

The result of this should contain every pair of polls that match (according to your rules) TWICE (once for each ID in the 1st position and once for each ID in the 1st position and once for it to be in the 2nd position)

We can then do GROUP BY

on the output of this to get a table that basically gives me the correct ip_count for each survey_id

(select x, count(*) c from (select s1.survey_id x, s2.survey_id y from surveys s1, surveys s2 where s1.survey_id != s2.survey_id and s1.ip_address = s2.ip_address and (s1.created and s2.created fall 6 months within each other)) group by x)

      



So now we have a table mapping each survey_id to its correct ip_count. To update the original table, we need to join it and copy the values ​​above

So it should look something like this:

UPDATE surveys SET s.ip_count = n.c from surveys s inner join (ABOVE QUERY) n on s.survey_id = n.x

      

There is some pseudo code, but I think the general idea should work

I've never had to update a table based on the output of another query before. Tried to guess the correct syntax for this from this question - How do I update a SELECT in SQL Server?

Also, if I needed to do something like this for my own work, I would not try to do it in one request .. That would be a pain to maintain and may have memory / performance issues. It would be better if the script would traverse the table row by row, update one transaction in a transaction, before moving on to the next row. Much slower, but easier to understand and arguably easier on your database.

+2


source


Several (very) minor tweaks to the one shown above. Thanks again!



UPDATE surveys AS s
INNER JOIN (
  SELECT x, count(*) c
  FROM (
    SELECT s1.id AS x, s2.id AS y
    FROM surveys AS s1, surveys AS s2
    WHERE s1.state IN (1, 3) # completed and verified
      AND s1.id != s2.id # dont self join
      AND s1.ip_address != "" AND s1.ip_address IS NOT NULL # not interested in blank entries
      AND s1.ip_address = s2.ip_address
      AND (s2.created BETWEEN (s1.created - INTERVAL 6 MONTH) AND (s1.created + INTERVAL 6 MONTH))
      AND s1.user_id = s2.user_id # where completed for the same user
  ) AS ipCount
  GROUP BY x
) n on s.id = n.x
SET s.ip_count = n.c

      

+2


source







All Articles