SQL prioritization
I am trying to do priority mapping on a table in a stored procedure. The requirements are a bit tricky to explain, but hopefully it makes sense. Let's say we have a table called books with fields for id, author, title, date, and pages.
We also have a stored procedure that will match a query with ONE row per table.
Here is the proc signature:
create procedure match
@pAuthor varchar(100)
,@pTitle varchar(100)
,@pDate varchar(100)
,@pPages varchar(100)
as
...
Priority rules:
- Try all 4 parameters first. If we find an answer refund.
- Then try to match any 3 parameters. The first parameter has the highest priority here and the fourth is the lowest. If we find matches, we return a match.
- Next, we check if any two parameters are the same, and finally if anyone does (still following the ordering ordering rules).
I have implemented this on a case-by-case basis. For example:
select @lvId = id
from books
where
author = @pAuthor
,title = @pTitle
,date = @pDate
,pages = @pPages
if @@rowCount = 1 begin
select @lvId
return
end
select @lvId = id
from books
where
author = @pAuthor
,title = @pTitle
,date = @pDate
if @@rowCount = 1 begin
select @lvId
return
end
....
However, for each new column in the table, the number of individual checks increases by an order of magnitude of 2. I would like to generalize this to X number of columns; however I am having problems with schema coming.
Thanks for reading and I can provide any further information.
Added:
Dave and others, I tried implementing your code and it chokes on the first Order by Clause where we add all the counts. This is giving me the wrong column name error. When I comment out the total and only order the individual aliases, the proc is compromised.
Does anyone have any idea?
This is in Microsoft Sql Server 2005
source to share
I believe the answers you are working on are actually simple. But I also believe they will always be full table scans on SQL Server. (In Oracle, you can use Bitmap indexes if there weren't many concurrent DMLs in the table)
A more complex solution, but much more efficient, would be to create your own index. Not a SQL Server index, but your own.
Create a table (hash index) with three columns (lookup-hash, rank, Rowid)
Let's say you have 3 columns to search for. A, B, C
For each row added to Books, you insert 7 rows into the hash_index, either via a trigger or CRUD proc.
You first
insert into hash_index
SELECT HASH(A & B & C), 7 , ROWID
FROM Books
Where is the concatenation operator and HASH is a function
then you insert hashes for A and B, A and C and B and C. Now you have some flexibility, you can give them the same rank, or if A and B are a perfect match for B and C, you can give them a higher rank ...
And then insert the Hashes for A on their own, and B and C with the same choice of rank ... all the same number or all different ... you can even say that a match on A is a higher choice than a match on B and C. This solution gives you a lot of flexibility.
Sure, this will add a lot of INSERT overhead, but if the DML on the books is poor or performance isn't relevant, you're fine.
Now when you go to search, you will create a function that returns the HASH table for your @A, @B, and @C. you will have a small table of 7 values that you join to the lookup hash in the hash index table. This will give you all possible matches and possibly some false matches (this is just the nature of the hashes). You will get this result, describe the order in the rank column. Then return the first rowid in the books table and make sure all @A @B @C values are in that row. In case of randomness it is not, and you got a false result, you will need to check the next rowid.
Each of these operations in this "roll-own" is very fast.
- Hashing your 3 values into a small table variable of 7 rows = very fast.
- concatenating them by index on your Hash_index table = very fast index lookup
- Ending the result set will result in 1 or maybe 2 or 3 accesses to the table by rowid = very fast
Of course, all of them together can be slower than the FTS ... But the FTS will keep going slower and slower. There will be a size that FTS will be slower than this. You will have to play with it.
You are not explaining what should happen if more than one outcome matches any given set of parameters that have been achieved, so you will need to change that to accommodate these business rules. I have now set it up to return books that match more recent parameters than those that don't. For example, matching authors, titles, and pages will appear to match author and title.
Your RDBMS may have a different way of handling "TOP", so you might have to adjust that as well.
SELECT TOP 1
author,
title,
date,
pages
FROM
Books
WHERE
author = @author OR
title = @title OR
date = @date OR
pages = @pages OR
ORDER BY
CASE WHEN author = @author THEN 1 ELSE 0 END +
CASE WHEN title = @title THEN 1 ELSE 0 END +
CASE WHEN date = @date THEN 1 ELSE 0 END +
CASE WHEN pages = @pages THEN 1 ELSE 0 END DESC,
CASE WHEN author = @author THEN 8 ELSE 0 END +
CASE WHEN title = @title THEN 4 ELSE 0 END +
CASE WHEN date = @date THEN 2 ELSE 0 END +
CASE WHEN pages = @pages THEN 1 ELSE 0 END DESC
source to share
I don't have time to write a request, but I think this idea will work.
Use "author = @pAuthor OR title = @ptitle ..." for your predicate, so you get all candidate strings.
Use CASE expressions or whatever to create virtual columns in the result set, for example:
SELECT CASE WHEN author = @pAuthor THEN 1 ELSE 0 END author_match,
...
Then add this order and return the first line:
ORDER BY (author_match+title_match+date_match+page_match) DESC,
author_match DESC,
title_match DESC,
date_match DESC
page_match DESC
You still need to expand it for every new column, but not much.
source to share
select id,
CASE WHEN @pPages = pages
THEN 1 ELSE 0
END
+ Case WHEN @pAuthor=author
THEN 1 ELSE 0
END AS
/* + Do this for each attribute. If each of your
attributes are just as important as the other
for example matching author is jsut as a good as matching title then
leave the values alone, if different matches are more
important then change the values */ as MatchRank
from books
where author = @pAuthor OR
title = @pTitle OR
date = @pDate
ORDER BY MatchRank DESC
Edited
When I run this query (modified only to fit one of my own tables), it works fine in SQL2005.
I would recommend a where clause, but you'll want to play around with it to see the performance impact. You will need to use the OR clause, otherwise you will lose potential matches
source to share
Ok, let me reiterate my understanding of your question: you need a stored procedure that can take a variable number of parameters and pass the top string that matches the parameters in a weighted order of preference passed to SQL Server 2005.
Ideally, it would use WHERE clauses to prevent full table scans, take advantage of indexes, and short-circuit searches — you don't want to search for all possible combinations if you can find them earlier. Perhaps we can also allow other comparators than = such as> = for dates, LIKE for strings, etc.
One possible way is to pass parameters as XML, for example in this article , and use .Net stored procedures, but keep that simple vanilla T -SQL for now.
It looks like a binary search on parameters: Search all parameters, then discard the last one, then discard the second but include the last, etc.
Pass parameters as a delimited string because stored procedures do not allow you to pass arrays as parameters. This will allow us to get a variable number of parameters in our stored procedure without requiring a stored procedure for every parameter change.
To allow any comparison, we pass the entire list of WHERE clauses, for example: title, for example "% something%"
Passing multiple parameters means splitting them into a string. We'll use the tilde character ~ to delimit options, for example: author = 'Chris Latta' ~ title, for example '% something%' ~ pages> = 100
Then just a matter of a binary weighted search for the first row that matches our ordered parameter list (hopefully the commented stored procedure is self-explanatory, and if not, please let me know). Note that you are always guaranteed a result (if your table has at least one row), since the last search was without parameters.
Here is the code for the stored procedure:
CREATE PROCEDURE FirstMatch
@SearchParams VARCHAR(2000)
AS
BEGIN
DECLARE @SQLstmt NVARCHAR(2000)
DECLARE @WhereClause NVARCHAR(2000)
DECLARE @OrderByClause NVARCHAR(500)
DECLARE @NumParams INT
DECLARE @Pos INT
DECLARE @BinarySearch INT
DECLARE @Rows INT
-- Create a temporary table to store our parameters
CREATE TABLE #params
(
BitMask int, -- Uniquely identifying bit mask
FieldName VARCHAR(100), -- The field name for use in the ORDER BY clause
WhereClause VARCHAR(100) -- The bit to use in the WHERE clause
)
-- Temporary table identical to our result set (the books table) so intermediate results arent output
CREATE TABLE #junk
(
id INT,
author VARCHAR(50),
title VARCHAR(50),
printed DATETIME,
pages INT
)
-- Ill use tilde ~ as the delimiter that separates parameters
SET @SearchParams = LTRIM(RTRIM(@SearchParams))+ '~'
SET @Pos = CHARINDEX('~', @SearchParams, 1)
SET @NumParams = 0
-- Populate the #params table with the delimited parameters passed
IF REPLACE(@SearchParams, '~', '') <> ''
BEGIN
WHILE @Pos > 0
BEGIN
SET @NumParams = @NumParams + 1
SET @WhereClause = LTRIM(RTRIM(LEFT(@SearchParams, @Pos - 1)))
IF @WhereClause <> ''
BEGIN
-- This assumes your field names dont have spaces and that you leave a space between the field name and the comparator
INSERT INTO #params (BitMask, FieldName, WhereClause) VALUES (POWER(2, @NumParams - 1), LTRIM(RTRIM(LEFT(@WhereClause, CHARINDEX(' ', @WhereClause, 1) - 1))), @WhereClause)
END
SET @SearchParams = RIGHT(@SearchParams, LEN(@SearchParams) - @Pos)
SET @Pos = CHARINDEX('~', @SearchParams, 1)
END
END
-- Set the binary search to search from all parameters down to one in order of preference
SET @BinarySearch = POWER(2, @NumParams)
SET @Rows = 0
WHILE (@BinarySearch > 0) AND (@Rows = 0)
BEGIN
SET @BinarySearch = @BinarySearch - 1
SET @WhereClause = ' WHERE '
SET @OrderByClause = ' ORDER BY '
SELECT @OrderByClause = @OrderByClause + FieldName + ', ' FROM #params WHERE (@BinarySearch & BitMask) = BitMask ORDER BY BitMask
SET @OrderByClause = LEFT(@OrderByClause, LEN(@OrderByClause) - 1) -- Remove the trailing comma
SELECT @WhereClause = @WhereClause + WhereClause + ' AND ' FROM #params WHERE (@BinarySearch & BitMask) = BitMask ORDER BY BitMask
SET @WhereClause = LEFT(@WhereClause, LEN(@WhereClause) - 4) -- Remove the trailing AND
IF @BinarySearch = 0
BEGIN
-- If nothing found so far, return the top row in the order of the parameters fields
SET @WhereClause = ''
-- Use the full order sequence of fields to return the results
SET @OrderByClause = ' ORDER BY '
SELECT @OrderByClause = @OrderByClause + FieldName + ', ' FROM #params ORDER BY BitMask
SET @OrderByClause = LEFT(@OrderByClause, LEN(@OrderByClause) - 1) -- Remove the trailing comma
END
-- Find out if there are any results for this search
SET @SQLstmt = 'SELECT TOP 1 id, author, title, printed, pages INTO #junk FROM books' + @WhereClause + @OrderByClause
Exec (@SQLstmt)
SET @Rows = @@RowCount
END
-- Stop the result set being eaten by the junk table
SET @SQLstmt = REPLACE(@SQLstmt, 'INTO #junk ', '')
-- Uncomment the next line to see the SQL you are producing
--PRINT @SQLstmt
-- This gives the result set
Exec (@SQLstmt)
END
This stored procedure is named like this:
FirstMatch 'author = ''Chris Latta''~pages > 100~title like ''%something%'''
You've got it - a fully extensible, optimized top result search in a weighted order of preference. This was an interesting problem and shows what you can do with native T-SQL.
Several small problems with this:
- it relies on the caller to know that they have to leave a space after the field name for the parameter to work properly
- you can't have field names with spaces in them - fixed with some effort
- it assumes that the corresponding sort order is always increasing.
- the next programmer to look at this procedure will think you are crazy :)
source to share
Try the following:
ALTER PROCEDURE match
@pAuthor varchar(100)
,@pTitle varchar(100)
,@pDate varchar(100)
,@pPages varchar(100)
-- exec match 'a title', 'b author', '1/1/2007', 15
AS
SELECT id,
CASE WHEN author = @pAuthor THEN 1 ELSE 0 END
+ CASE WHEN title = @pTitle THEN 1 ELSE 0 END
+ CASE WHEN bookdate = @pDate THEN 1 ELSE 0 END
+ CASE WHEN pages = @pPages THEN 1 ELSE 0 END AS matches,
CASE WHEN author = @pAuthor THEN 4 ELSE 0 END
+ CASE WHEN title = @pTitle THEN 3 ELSE 0 END
+ CASE WHEN bookdate = @pDate THEN 2 ELSE 0 END
+ CASE WHEN pages = @pPages THEN 1 ELSE 0 END AS score
FROM books
WHERE author = #pAuthor
OR title = @pTitle
OR bookdate = @PDate
OR pages = @pPages
ORDER BY matches DESC, score DESC
However, this of course triggers a table scan. You can avoid this by making it concatenate the CTE and 4 WHERE clauses, one for each property - there will be duplicates, but you can just take TOP 1 anyway.
EDIT: Added WHERE ... OR clause. I would feel more comfortable if I was
SELECT ... FROM books WHERE author = @pAuthor
UNION
SELECT ... FROM books WHERE title = @pTitle
UNION
...
source to share
As for the Order By clause, which is not possible to compile:
As it is said recursively (in the comment), the alias “cannot be in expressions that are used in the In order clauses. To get around this, I used a subquery that returned rows and then ordered in an outer query. That way I can use alias' in order. Slightly slower, but much cleaner.
source to share