Efficient replacement of many characters from a string

I would like to know the most efficient

way to removing

any occurrence

symbols such as , ; / "

from a column varchar

.

I have such a function, but it is incredibly slow. The table has about 20 million records .

CREATE FUNCTION [dbo].[Udf_getcleanedstring] (@s VARCHAR(255))
returns VARCHAR(255)
AS
  BEGIN
      DECLARE @o VARCHAR(255)

      SET @o = Replace(@s, '/', '')
      SET @o = Replace(@o, '-', '')
      SET @o = Replace(@o, ';', '')
      SET @o = Replace(@o, '"', '')

      RETURN @o
  END 

      

+3


source to share


4 answers


Whichever method you use is probably worth adding

WHERE YourCol LIKE '%[/-;"]%'

      

Unless you suspect that a very large portion of the lines actually contain at least one of the characters that need to be removed.

As you use it in the statement UPDATE

, simply adding an attribute WITH SCHEMABINDING

can greatly improve things and allow the UPDATE to continue line by line rather than cache the entire operation in the coil first for Halloween Defense

enter image description here



Nested REPLACE

calls in TSQL are still slow, although they involve multiple passes through strings.

You can knock the CLR function like below (if you haven't worked with them before, it is very easy to deploy them from the SSDT project as long as the CLR is enabled on the server). The UPDATE plan for this does not contain a coil either.

The regex uses (?:)

to denote a non-capturing group with various interesting characters separated by an alternation character |

as /|-|;|\"

( "

needs to be escaped in a string literal, therefore preceded by a forward slash).

using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;

public partial class UserDefinedFunctions
{
    private static readonly Regex regexStrip = 
                        new Regex("(?:/|-|;|\")", RegexOptions.Compiled);

    [SqlFunction]
    public static SqlString StripChars(SqlString Input)
    {
        return Input.IsNull ?  null : regexStrip.Replace((string)Input, "");        
    }
}

      

+3


source


I want to show the huge performance differences between using with two USER DIFINED FUNCTIONS types:

  • TABLE user function
  • SCALAR user function

See example test:

use AdventureWorks2012
go

-- create table for the test
create table dbo.FindString (ColA int identity(1,1) not null primary key,ColB varchar(max) );

declare @text varchar(max) =  'A web server can handle a Hypertext Transfer Protocol request either by reading 
a file from its file ; system based on the URL <> path or by handling the request using logic that is specific 
to the type of resource. In the case that special logic is invoked the query string will be available to that logic 
for use in its processing, along with the path component of the URL.';

-- init process in loop 1,000,000 
insert into dbo.FindString(ColB)
select @text 
go 1000000

-- use one of the scalar function from the answers which post in this thread
alter function [dbo].[udf_getCleanedString]
( 
@s varchar(max)
)
returns  varchar(max)
as
begin
return replace(replace(replace(replace(@s,'/',''),'-',''),';',''),'"','')
end
go
--
-- create from the function above new function an a table function ;
create function [dbo].[utf_getCleanedString]
( 
@s varchar(255)
)
returns  table 
as return
(
select  replace(replace(replace(replace(@s,'/',''),'-',''),';',''),'"','') as String
)
go

--
-- clearing the buffer cach
DBCC DROPCLEANBUFFERS ;
go
-- update process using USER TABLE FUNCTIO
update Dest with(rowlock) set
dest.ColB  = D.String
from dbo.FindString dest
cross apply utf_getCleanedString(dest.ColB) as D
go

DBCC DROPCLEANBUFFERS ;
go
-- update process using USER SCALAR FUNCTION
update Dest with(rowlock) set
dest.ColB  =  dbo.udf_getCleanedString(dest.ColB) 
from dbo.FindString dest
go

      



And this is the execution plan: Since you can see that UTF is much better than USF, they 2 do the same, replacing a string, but one returns a scalar and the other returns as a table

As you can see the UTF is much better the USF

Another important parameter to view (SET STATISTICS IO ON;)

SET STATISTICS IO ON

+2


source


How to combine them into one call:

 create function [dbo].[udf_getCleanedString]
 ( 
    @s varchar(255)
 )
 returns varchar(255)
 as
 begin
   return replace(replace(replace(replace(@s,'/',''),'-',''),';',''),'"','')
 end

      

Or you can do UPDATE

on the table itself first . Scalar functions are pretty slow.

0


source


Here is a similar question asked earlier, I like this approach mentioned here.

How do I replace multiple characters in SQL?

declare @badStrings table (item varchar(50))

INSERT INTO @badStrings(item)
SELECT '>' UNION ALL
SELECT '<' UNION ALL
SELECT '(' UNION ALL
SELECT ')' UNION ALL
SELECT '!' UNION ALL
SELECT '?' UNION ALL
SELECT '@'

declare @testString varchar(100), @newString varchar(100)

set @teststring = 'Juliet ro><0zs my s0x()rz!!?!one!@!@!@!'
set @newString = @testString

SELECT @newString = Replace(@newString, item, '') FROM @badStrings

select @newString -- returns 'Juliet ro0zs my s0xrzone'

      

0


source







All Articles