Highlight usernames from free-form text box

I have a large table of 30 million records that contains a free-form text box that can contain names in any position and with any greeting or no greeting at all.

My job is to mask names with Xxxxx Xxxxx to maintain privacy.

I have access to a large database of last names that defines for me what the first name is.

Using SQL Server 2012, what is the most efficient method I can use for this task?

EDIT

Ok, I have something very decent that includes full text index / search, name database, and stored procedure.

However, I ran into a rather peculiar problem. I use the predicate CONTAINS (CONTAINS ([textvaluefield], @namestring), where SET @namestring = 'NEAR ((Dr.,' + @Name + '), 1, TRUE)'.

This works fine except when the greeting in [textvaluefield] is "DR". instead of "Doctor", i.e. "Dr. Johnson" doesn't get, but "Dr. Johnson". I checked this because if I change the value in the [textvaluefield] field of the record from "DR". on "Doctor", but let's leave everything else the same, that the record will suddenly go up. If I return the entry to use "DR." It will not be received again.

What this fancy fact does is that I am definitely using case insensitivity (Latin1_General_CI_AS). Does anyone have any idea?

+3


source to share


1 answer


If you can make sure you have no entries in the stopword tables:

SELECT * FROM sys.[fulltext_system_stopwords] AS FSS WHERE [stopword] LIKE 'Dr_'
SELECT * FROM sys.[fulltext_stopwords] AS FS

      

I also ran into a similar issue and resolved it by creating a schema binding to the tables and columns you need and explicitly create the column using the LOWER function.



CREATE VIEW [User].[UserValues]
WITH
 SCHEMABINDING
AS
SELECT
        [UserId]
      , [UserName]
      , LOWER(Username]) AS [LoweredUsername]
    FROM
        [User].[Values]

      

Remember to add a unique clustered index for full text usage.

0


source







All Articles