Excluding "close" duplicates from mysql query

We have an iPhone app that sends invoice data to each of our employees multiple times a day. When they are in the low signal area, tickets may arrive as duplicates, however, they are assigned a unique "job ID" in the mysql database, so they are treated as unique. I could exclude the job id and make the rest of the columns DISTINCT, which gives me the filtered rows I'm looking for (since literally every data point is identical except for the job id), however I need the job id as it is the primary checkpoint for each invoices and this is what I point to: approvals, changes, etc.

So my question is, how can I filter out "close" duplicate lines in my request, yet still pulling the job id for each ticket?

The current request is below:

SELECT * FROM jobs, users
WHERE jobs.job_csuper = users.user_id
AND users.user_email = '".$login."'
AND jobs.job_approverid1 = '0'

      

Thanks for watching!

Edit (examples given): This is what I meant by "close duplicate"

Job_ID - Job_title - Job_user - Job_time - Job_date
2345 - Worked on circuits - John Smith - 1.50 - 2013-01-01
2344 - Worked on circuits - John Smith - 1.50 - 2013-01-01
2343 - Worked on circuits - John Smith - 1.50 - 2013-01-01

      

So everything is identical except for the Job_ID column.

+3


source to share


3 answers


You want group by

:

SELECT *
FROM jobs, users
WHERE jobs.job_csuper = users.user_id
AND users.user_email = '".$login."'
AND jobs.job_approverid1 = '0'
group by <all fields from jobs except jobid>

      

I think the final request should look something like this:



select min(Job_ID) as JobId, Job_title, user.name as Job_user, Job_time, Job_date
FROM jobs join users
     on jobs.job_csuper = users.user_id
WHERE jusers.user_email = '".$login."' AND jobs.job_approverid1 = '0'
group by Job_title, user.name, Job_time, Job_date

      

(This uses ANSI syntax for joins and explicitly indicates the returned fields.)

+1


source


  • Better to avoid double submission.
  • Considering that you cannot prevent double presentation ...

I would ask like this:



select
   min(Job_ID)          as real_job_id
  ,count(Job_ID)        as num_dup_job_ids
  ,group_concat(Job_ID) as all_dup_job_ids
  ,j.Job_title, j.Job_user, j.Job_time, j.Job_date
from
  jobs j
  inner join users u on u.user_id = j.job_csuper
where
  whatever_else
group by
  j.Job_title, j.Job_user, j.Job_time, j.Job_date

      

This includes more than you explicitly asked for. But it's probably good to keep in mind how many duplicates you have, and this gives you easy access to duplicate credentials when you need it.

+1


source


How to create a hash for each line and compare them:

`SHA1(concat_ws(field1, field2, field3, ...)) AS jobhash`

      

0


source







All Articles