SQL prioritization

I am trying to do priority mapping on a table in a stored procedure. The requirements are a bit tricky to explain, but hopefully it makes sense. Let's say we have a table called books with fields for id, author, title, date, and pages.

We also have a stored procedure that will match a query with ONE row per table.

Here is the proc signature:

create procedure match
  @pAuthor varchar(100)
  ,@pTitle varchar(100)
  ,@pDate varchar(100)
  ,@pPages varchar(100)

 as

 ...

      

Priority rules:

  • Try all 4 parameters first. If we find an answer refund.
  • Then try to match any 3 parameters. The first parameter has the highest priority here and the fourth is the lowest. If we find matches, we return a match.
  • Next, we check if any two parameters are the same, and finally if anyone does (still following the ordering ordering rules).

I have implemented this on a case-by-case basis. For example:

 select @lvId = id 
 from books
 where
  author = @pAuthor 
 ,title = @pTitle 
 ,date = @pDate 
 ,pages = @pPages

if @@rowCount = 1 begin
  select @lvId
  return
end

 select @lvId = id 
  from books
 where
  author = @pAuthor 
 ,title = @pTitle 
 ,date = @pDate 

 if @@rowCount = 1 begin
  select @lvId
  return
end

....

      

However, for each new column in the table, the number of individual checks increases by an order of magnitude of 2. I would like to generalize this to X number of columns; however I am having problems with schema coming.

Thanks for reading and I can provide any further information.


Added:

Dave and others, I tried implementing your code and it chokes on the first Order by Clause where we add all the counts. This is giving me the wrong column name error. When I comment out the total and only order the individual aliases, the proc is compromised.

Does anyone have any idea?

This is in Microsoft Sql Server 2005

+1


source to share


7 replies


I believe the answers you are working on are actually simple. But I also believe they will always be full table scans on SQL Server. (In Oracle, you can use Bitmap indexes if there weren't many concurrent DMLs in the table)

A more complex solution, but much more efficient, would be to create your own index. Not a SQL Server index, but your own.

Create a table (hash index) with three columns (lookup-hash, rank, Rowid)

Let's say you have 3 columns to search for. A, B, C

For each row added to Books, you insert 7 rows into the hash_index, either via a trigger or CRUD proc.

You first

insert into hash_index 
SELECT HASH(A & B & C), 7 , ROWID
FROM Books

      

Where is the concatenation operator and HASH is a function



then you insert hashes for A and B, A and C and B and C. Now you have some flexibility, you can give them the same rank, or if A and B are a perfect match for B and C, you can give them a higher rank ...

And then insert the Hashes for A on their own, and B and C with the same choice of rank ... all the same number or all different ... you can even say that a match on A is a higher choice than a match on B and C. This solution gives you a lot of flexibility.

Sure, this will add a lot of INSERT overhead, but if the DML on the books is poor or performance isn't relevant, you're fine.

Now when you go to search, you will create a function that returns the HASH table for your @A, @B, and @C. you will have a small table of 7 values ​​that you join to the lookup hash in the hash index table. This will give you all possible matches and possibly some false matches (this is just the nature of the hashes). You will get this result, describe the order in the rank column. Then return the first rowid in the books table and make sure all @A @B @C values ​​are in that row. In case of randomness it is not, and you got a false result, you will need to check the next rowid.

Each of these operations in this "roll-own" is very fast.

  • Hashing your 3 values ​​into a small table variable of 7 rows = very fast.
  • concatenating them by index on your Hash_index table = very fast index lookup
  • Ending the result set will result in 1 or maybe 2 or 3 accesses to the table by rowid = very fast

Of course, all of them together can be slower than the FTS ... But the FTS will keep going slower and slower. There will be a size that FTS will be slower than this. You will have to play with it.

+2


source


You are not explaining what should happen if more than one outcome matches any given set of parameters that have been achieved, so you will need to change that to accommodate these business rules. I have now set it up to return books that match more recent parameters than those that don't. For example, matching authors, titles, and pages will appear to match author and title.

Your RDBMS may have a different way of handling "TOP", so you might have to adjust that as well.



SELECT TOP 1
     author,
     title,
     date,
     pages
FROM
     Books
WHERE
     author = @author OR
     title = @title OR
     date = @date OR
     pages = @pages OR
ORDER BY
     CASE WHEN author = @author THEN 1 ELSE 0 END +
     CASE WHEN title = @title THEN 1 ELSE 0 END +
     CASE WHEN date = @date THEN 1 ELSE 0 END +
     CASE WHEN pages = @pages THEN 1 ELSE 0 END DESC,

     CASE WHEN author = @author THEN 8 ELSE 0 END +
     CASE WHEN title = @title THEN 4 ELSE 0 END +
     CASE WHEN date = @date THEN 2 ELSE 0 END +
     CASE WHEN pages = @pages THEN 1 ELSE 0 END DESC

      

+1


source


I don't have time to write a request, but I think this idea will work.

Use "author = @pAuthor OR title = @ptitle ..." for your predicate, so you get all candidate strings.

Use CASE expressions or whatever to create virtual columns in the result set, for example:

SELECT CASE WHEN author = @pAuthor THEN 1 ELSE 0 END author_match,
       ...

      

Then add this order and return the first line:

ORDER BY (author_match+title_match+date_match+page_match) DESC,
         author_match DESC,
         title_match DESC,
         date_match DESC
         page_match DESC

      

You still need to expand it for every new column, but not much.

+1


source


      select id, 
               CASE WHEN @pPages = pages 
                    THEN 1 ELSE 0 
               END
             +  Case WHEN @pAuthor=author 
                    THEN 1 ELSE 0 
                END AS 
             /* +  Do this for each attribute. If each of your 
attributes are just as important as the other 
for example matching author is jsut as a good as matching title then 
leave the values alone, if different matches are more 
important then change the values */ as MatchRank  
        from books 

        where  author = @pAuthor OR
               title = @pTitle OR
               date = @pDate

     ORDER BY  MatchRank DESC

      

Edited

When I run this query (modified only to fit one of my own tables), it works fine in SQL2005.

I would recommend a where clause, but you'll want to play around with it to see the performance impact. You will need to use the OR clause, otherwise you will lose potential matches

0


source


Ok, let me reiterate my understanding of your question: you need a stored procedure that can take a variable number of parameters and pass the top string that matches the parameters in a weighted order of preference passed to SQL Server 2005.

Ideally, it would use WHERE clauses to prevent full table scans, take advantage of indexes, and short-circuit searches — you don't want to search for all possible combinations if you can find them earlier. Perhaps we can also allow other comparators than = such as> = for dates, LIKE for strings, etc.

One possible way is to pass parameters as XML, for example in this article , and use .Net stored procedures, but keep that simple vanilla T -SQL for now.

It looks like a binary search on parameters: Search all parameters, then discard the last one, then discard the second but include the last, etc.

Pass parameters as a delimited string because stored procedures do not allow you to pass arrays as parameters. This will allow us to get a variable number of parameters in our stored procedure without requiring a stored procedure for every parameter change.

To allow any comparison, we pass the entire list of WHERE clauses, for example: title, for example "% something%"

Passing multiple parameters means splitting them into a string. We'll use the tilde character ~ to delimit options, for example: author = 'Chris Latta' ~ title, for example '% something%' ~ pages> = 100

Then just a matter of a binary weighted search for the first row that matches our ordered parameter list (hopefully the commented stored procedure is self-explanatory, and if not, please let me know). Note that you are always guaranteed a result (if your table has at least one row), since the last search was without parameters.

Here is the code for the stored procedure:

CREATE PROCEDURE FirstMatch
@SearchParams VARCHAR(2000)
AS
BEGIN
    DECLARE @SQLstmt NVARCHAR(2000)
    DECLARE @WhereClause NVARCHAR(2000)
    DECLARE @OrderByClause NVARCHAR(500)
    DECLARE @NumParams INT
    DECLARE @Pos INT
    DECLARE @BinarySearch INT
    DECLARE @Rows INT

    -- Create a temporary table to store our parameters
    CREATE TABLE #params 
    (
        BitMask int,             -- Uniquely identifying bit mask
        FieldName VARCHAR(100),  -- The field name for use in the ORDER BY clause
        WhereClause VARCHAR(100) -- The bit to use in the WHERE clause
    )

    -- Temporary table identical to our result set (the books table) so intermediate results arent output
    CREATE TABLE #junk
    (
        id INT,
        author VARCHAR(50),
        title VARCHAR(50),
        printed DATETIME,
        pages INT
    )

    -- Ill use tilde ~ as the delimiter that separates parameters
    SET @SearchParams = LTRIM(RTRIM(@SearchParams))+ '~'
    SET @Pos = CHARINDEX('~', @SearchParams, 1)
    SET @NumParams = 0

    -- Populate the #params table with the delimited parameters passed
    IF REPLACE(@SearchParams, '~', '') <> ''
    BEGIN
        WHILE @Pos > 0
        BEGIN
            SET @NumParams = @NumParams + 1
            SET @WhereClause = LTRIM(RTRIM(LEFT(@SearchParams, @Pos - 1)))
            IF @WhereClause <> ''
            BEGIN
                -- This assumes your field names dont have spaces and that you leave a space between the field name and the comparator
                INSERT INTO #params (BitMask, FieldName, WhereClause) VALUES (POWER(2, @NumParams - 1), LTRIM(RTRIM(LEFT(@WhereClause, CHARINDEX(' ', @WhereClause, 1) - 1))), @WhereClause) 
            END
            SET @SearchParams = RIGHT(@SearchParams, LEN(@SearchParams) - @Pos)
            SET @Pos = CHARINDEX('~', @SearchParams, 1)
        END
    END 

    -- Set the binary search to search from all parameters down to one in order of preference
    SET @BinarySearch = POWER(2, @NumParams) 
    SET @Rows = 0
    WHILE (@BinarySearch > 0) AND (@Rows = 0)
    BEGIN
        SET @BinarySearch = @BinarySearch - 1
        SET @WhereClause = ' WHERE '
        SET @OrderByClause = ' ORDER BY '
        SELECT @OrderByClause = @OrderByClause + FieldName + ', ' FROM #params WHERE (@BinarySearch & BitMask) = BitMask ORDER BY BitMask
        SET @OrderByClause = LEFT(@OrderByClause, LEN(@OrderByClause) - 1) -- Remove the trailing comma
        SELECT @WhereClause = @WhereClause + WhereClause + ' AND ' FROM #params WHERE (@BinarySearch & BitMask) = BitMask ORDER BY BitMask
        SET @WhereClause = LEFT(@WhereClause, LEN(@WhereClause) - 4) -- Remove the trailing AND

        IF @BinarySearch = 0
        BEGIN
            -- If nothing found so far, return the top row in the order of the parameters fields
            SET @WhereClause = ''
            -- Use the full order sequence of fields to return the results
            SET @OrderByClause = ' ORDER BY '
            SELECT @OrderByClause = @OrderByClause + FieldName + ', ' FROM #params ORDER BY BitMask
            SET @OrderByClause = LEFT(@OrderByClause, LEN(@OrderByClause) - 1) -- Remove the trailing comma
        END

        -- Find out if there are any results for this search
        SET @SQLstmt = 'SELECT TOP 1 id, author, title, printed, pages INTO #junk FROM books' + @WhereClause + @OrderByClause
        Exec (@SQLstmt)

        SET @Rows = @@RowCount
    END

    -- Stop the result set being eaten by the junk table
    SET @SQLstmt = REPLACE(@SQLstmt, 'INTO #junk ', '')

    -- Uncomment the next line to see the SQL you are producing
    --PRINT @SQLstmt

    -- This gives the result set
    Exec (@SQLstmt)
END

      

This stored procedure is named like this:

FirstMatch 'author = ''Chris Latta''~pages > 100~title like ''%something%'''

      

You've got it - a fully extensible, optimized top result search in a weighted order of preference. This was an interesting problem and shows what you can do with native T-SQL.

Several small problems with this:

  • it relies on the caller to know that they have to leave a space after the field name for the parameter to work properly
  • you can't have field names with spaces in them - fixed with some effort
  • it assumes that the corresponding sort order is always increasing.
  • the next programmer to look at this procedure will think you are crazy :)
0


source


Try the following:

ALTER PROCEDURE match  
  @pAuthor varchar(100)  
 ,@pTitle varchar(100)  
 ,@pDate varchar(100)  
 ,@pPages varchar(100)  
-- exec match 'a title', 'b author', '1/1/2007', 15  
AS

SELECT  id,

        CASE WHEN author = @pAuthor THEN 1 ELSE 0 END
        + CASE WHEN title = @pTitle THEN 1 ELSE 0 END
        + CASE WHEN bookdate = @pDate THEN 1 ELSE 0 END
        + CASE WHEN pages = @pPages THEN 1 ELSE 0 END AS matches,

        CASE WHEN author = @pAuthor THEN 4 ELSE 0 END
        + CASE WHEN title = @pTitle THEN 3 ELSE 0 END
        + CASE WHEN bookdate = @pDate THEN 2 ELSE 0 END
        + CASE WHEN pages = @pPages THEN 1 ELSE 0 END AS score
FROM books
WHERE author = #pAuthor 
    OR title = @pTitle 
    OR bookdate = @PDate 
    OR pages = @pPages
ORDER BY matches DESC, score DESC

      

However, this of course triggers a table scan. You can avoid this by making it concatenate the CTE and 4 WHERE clauses, one for each property - there will be duplicates, but you can just take TOP 1 anyway.

EDIT: Added WHERE ... OR clause. I would feel more comfortable if I was

SELECT ... FROM books WHERE author = @pAuthor
UNION
SELECT ... FROM books WHERE title = @pTitle
UNION
...

      

0


source


As for the Order By clause, which is not possible to compile:

As it is said recursively (in the comment), the alias “cannot be in expressions that are used in the In order clauses. To get around this, I used a subquery that returned rows and then ordered in an outer query. That way I can use alias' in order. Slightly slower, but much cleaner.

0


source







All Articles