Add number to rows based on identical values ​​in selected columns

I have a PostgreSQL database that has travel tickets written by multiple jurisdictions.

Some jurisdictions do not indicate whether multiple tickets are recorded at one traffic stop. However, this can be done by analyzing other fields. Consider the following data:

ticket_id  timestamp            drivers_license
----------------------------------------------
1          2008-08-07 01:51:00  11111111
2          2008-08-07 01:51:00  11111111
3          2008-08-07 02:02:00  22222222
4          2008-08-07 02:25:00  33333333
5          2008-08-07 04:23:00  44444444
6          2008-08-07 04:23:00  55555555
7          2008-08-07 04:23:00  44444444

      

I can conclude that:

  • Tickets 1 and 2 were recorded at one traffic stop as they share license numbers and driver timestamps.
  • Same for 5 and 7, but notice how ticket 6 sits between them. It is possible that another officer was writing the ticket at the same time somewhere else, or the data entry operators were entering data out of order.

I would like to add another column with a unique ID for each traffic stop. It will not be the primary key for the table as it will have duplicate values. For example:

ticket_id  timestamp            drivers_license  stop_id
--------------------------------------------------------
1          2008-08-07 01:51:00  11111111         1
2          2008-08-07 01:51:00  11111111         1
3          2008-08-07 02:02:00  22222222         2
4          2008-08-07 02:25:00  33333333         3
5          2008-08-07 04:23:00  44444444         4
6          2008-08-07 04:23:00  55555555         5
7          2008-08-07 04:23:00  44444444         4

      

I can think of computationally intensive, greedy algorithms how to do this from C #, but is there an efficient SQL query that can work?

+3


source to share


4 answers


Effective FTW SQL Query!

I am not on a computer where I can test this, so there are some syntax problems; I'll install in the morning, but it goes something like this:



WITH uniquez as (SELECT timestamp, drivers_license, 
rank() over (ORDER BY timestamp, drivers_license) as counterz 
FROM ticketTable)

UPDATE ticketTable TT
SET stop_id = uniquez.counterz
WHERE uniquez.timestamp = TT.timestamp
AND uniquez.drivers_license = TT.drivers_license

      

Basically, you select groups (sections) by timestamps and drivers_license and have a row counter that comes with it. When you update, you use this previous table table row count as your "stop_id" and update the columns corresponding to the timestamp and driver licenses.

+1


source


If you use a window function , it becomes surprisingly simple: rank()

SELECT *
      ,rank() OVER (ORDER BY ts, drivers_license)
FROM   tbl
ORDER  BY ticket_id

      



Returns exactly what you requested.

I renamed your column timestamp

to ts

because timestamp

is a type name in PostgreSQL and a reserved word in every SQL Standard.

+3


source


It is probably best to create a new table (say "stop") with DISTINCT timestamps and #s driver license, assign row numbers, and then update the ticket table from this new table.

+1


source


SELECT ticket_id,timestamp,drivers_license,substr(drivers_license,1,1) as stop_id
FROM traffic_data;

      

Hope this works for u ... :)

-1


source







All Articles