SQL Server - replace OR when JOINING on non-indexed field
History:
One of our clients has a system in which they have records of telephone conversations between their agents and their clients, with whom they create various contracts. The records are stored on the server with their locations stored in the records table in the database. Agents can then "attach" the entry to the contract, which creates an entry in the ContractRecordings table. I need to create a report that shows which records are not contract bound, but the way the tables are created makes this more complicated than expected.
Recording
------------------------------
recordingId : INT PK, IDENTITY
agentId : INT, FK
filename : NVARCHAR(255)
ContractRecording
----------------------------------
recordingId : INT PK, IDENTITY
contractNumber : INT
created : DATETIME
username : NVARCHAR(20)
note : NVARCHAR(MAX)
fileLocation : NVARCHAR(max)
It would be easy if ContractRecording.recordingId was a foreign key reference for Recording.recordingId, but it isn't. This is his own identification key. The only link between the tables is the location of the file, but Recording.filename only stores the file name, while ContractRecording.fileLocation stores the full path. Yes, I know, but I did not design these tables. Fortunately, there is a pattern, and the full path is derived from the agent name and record date, which we can learn from the data in the record table. But of course there is another problem: the file path format changed a year ago, and some records are stored in the old format and some in the new format.
Old format: C: \ John-Recordings \ 2015 \ 06 \ 15-0811.wav
New format: C: \ Recordings \ John Smith \ 2015 \ 06 \ 15-0811.wav
Problem:
To link two tables, I have to join them in the full path of records, which must be manually created in the Recording table and can be in one of two formats. I first tried using OR in the JOIN clause, but it takes about 8 minutes to go back about 15k lines, which is not acceptable. Then I tried to use two LEFT OUTER JOINs - one for each condition, but what seemed to be the same took ten minutes. I am assuming that since I am joining a custom field that is not indexed. Splitting it into two SELECTs and using UNION resulted in duplicate rows, each query would return one row for each record. Do I have any other options to get this request up to a few seconds? Here's my original query using the OR clause.
SELECT * FROM
(SELECT
cr.recordingid AS "attachedrecordingid"
,rec.recordingid AS "rawrecordingid"
,cr.contractnumber
,cr.created
,rec.name
,cr.note
,cr.filelocation
,rec.filename
,rec.recordtime
FROM ContractRecording cr
RIGHT OUTER JOIN
(SELECT
recordingid
,a.name
,filename
,retain
,r.recordtime
,'C:\Recordings\' + a.name + '\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename AS "fullpathnew"
,'C:\' + SUBSTRING(a.name, 0, CHARINDEX(' ', a.name, 0)) + '-Recordings\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename AS "fullpathold"
FROM Recording r
JOIN Agents a
ON r.agentid = a.agentid) rec
ON cr.filelocation = rec.fullpathold OR cr.filelocation = rec.fullpathnew) main
ORDER BY main.name, main.recordtime
The report should display one row for all records in the record table (unless only one record is tied to multiple contracts, in which case it should show one row per pair), while data from the Contract record is displayed if there are any rows which correspond to one of the file formats.
If absolutely necessary, I don't mind just fetching all the data from both tables and linking them using code, but that would be the last resort.
UPDATE:
As requested, here is the parse version of the UNION query. As mentioned, it returns two strings for each pairing - one with data and one without. This is because at least one of the two JOINs will not always have a match, but I just want to ignore them when the other JOIN has a match. If neither JOIN matches, I also want to display it once. I'm less sure that I can achieve the desired result with UNION than with the other capabilities, so I didn't pursue this approach.
SELECT * FROM
((SELECT
cr.recordingid AS "attachedrecordingid"
,rec.recordingid AS "rawrecordingid"
,cr.contractnumber
,cr.created
,rec.name
,cr.note
,cr.filelocation
,rec.filename
,rec.recordtime
FROM ContractRecording cr
RIGHT OUTER JOIN
(SELECT
recordingid
,a.name
,filename
,retain
,r.recordtime
,'C:\' + SUBSTRING(a.name, 0, CHARINDEX(' ', a.name, 0)) + '-Recordings\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename AS "fullpathold"
FROM Recording r
JOIN Agents a
ON r.agentid = a.agentid) rec
ON cr.filelocation = rec.fullpathold)
UNION
(SELECT
cr.recordingid AS "attachedrecordingid"
,rec.recordingid AS "rawrecordingid"
,cr.contractnumber
,cr.created
,rec.name
,cr.note
,cr.filelocation
,rec.filename
,rec.recordtime
FROM ContractRecording cr
RIGHT OUTER JOIN
(SELECT
recordingid
,a.name
,filename
,retain
,r.recordtime
,'C:\Recordings\' + a.name + '\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename AS "fullpathnew"
FROM Recording r
JOIN Agents a
ON r.agentid = a.agentid) rec
ON cr.filelocation = rec.fullpathnew)) main
ORDER BY main.name, main.recordtime
source to share
You can try using LIKE
WITH AgentRecordings AS
(
SELECT
a.name,
r.recordingId AS rawrecordingid,
r.filename,
r.recordtime,
CONCAT(
'C:\Recordings\' + a.name + '\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + FILENAME,
'C:\' + SUBSTRING(a.name, 0, CHARINDEX(' ', a.name, 0)) + '-Recordings\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename
) AS filepaths
FROM
Agents a
JOIN Recording r ON a.agentId = r.agentId
)
SELECT
cr.recordingid AS "attachedrecordingid"
,rec.recordingid AS "rawrecordingid"
,cr.contractnumber
,cr.created
,rec.name
,cr.note
,cr.filelocation
,rec.filename
,rec.recordtime
FROM
AgentRecordings rec
LEFT JOIN ContractRecording cr ON rec.filepaths LIKE '%' + cr.filelocation + '%'
If it helps anyone. I will also try to create a temp table instead of using cte and see if that helps more.
You can also try splitting the two OR operators by 2 cte and using concatenation to concatenate the found record id
WITH fullpathnew AS
(
SELECT cr.recordingid AS "attachedrecordingid",
rec.recordingid AS "rawrecordingid",
cr.contractnumber,
cr.created,
cr.note,
cr.filelocation
FROM Agents a
JOIN Recording r ON a.agentId = r.agentId
JOIN ContractRecording cr ON cr.filelocation = 'C:\Recordings\' + a.name + '\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename
),
fullpathold AS
(
SELECT cr.recordingid AS "attachedrecordingid",
rec.recordingid AS "rawrecordingid",
cr.contractnumber,
cr.created,
cr.note,
cr.filelocation
FROM Agents a
JOIN Recording r ON a.agentId = r.agentId
JOIN ContractRecording cr ON cr.filelocation = 'C:\' + SUBSTRING(a.name, 0, CHARINDEX(' ', a.name, 0)) + '-Recordings\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename
)
combinedCtes AS
(
SELECT attachedrecordingid, rawrecordingid, contractnumber, created, note, filelocation FROM fullpathnew
UNION SELECT attachedrecordingid, rawrecordingid, contractnumber, created, note, filelocation FROM fullpathold
)
SELECT cte.attachedrecordingid
,r.recordingid AS "rawrecordingid"
,cte.contractnumber
,cte.created
,a.name
,cte.note
,cte.filelocation
,r.filename
,r.recordtime
FROM Agents a
JOIN Recording r ON r.agentId = a.agentId
LEFT JOIN combinedCtes cte ON r.recordingid = cte.rawrecordingid
Yours UNION
should be in the pick and then you can join that subquery
SELECT j.attachedrecordingid
,r.recordingid AS rawrecordingid
,j.contractnumber
,j.created
,a.NAME
,j.note
,j.filelocation
,r.filename
,r.recordtime
FROM Agents a
JOIN Recording r ON a.agentId = r.agentId
LEFT JOIN(
SELECT cr.recordingid AS "attachedrecordingid"
,rec.recordingid AS "rawrecordingid"
,cr.contractnumber
,cr.created
,cr.note
,cr.filelocation
FROM Agents a
JOIN Recording r
JOIN ContractRecording cr
ON cr1.filelocation = 'C:\' + SUBSTRING(a.name, 0, CHARINDEX(' ', a.name, 0)) + '-Recordings\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename
UNION
SELECT cr.recordingid AS "attachedrecordingid"
,rec.recordingid AS "rawrecordingid"
,cr.contractnumber
,cr.created
,cr.note
,cr.filelocation
FROM Agents a
JOIN Recording r
JOIN ContractRecording cr
ON cr1.filelocation = 'C:\Recordings\' + a.name + '\' + CONVERT(NVARCHAR(4), DATEPART(yyyy, recordtime)) + '\' + CONVERT(NVARCHAR(2), DATEPART(m, recordtime)) + '\' + filename
) j ON r.recordingId = j.rawrecordingid
ORDER BY a.name, r.recordtime
source to share