Efficient replacement of many characters from a string
I would like to know the most efficient
way to removing
any occurrence
symbols such as , ; / "
from a column varchar
.
I have such a function, but it is incredibly slow. The table has about 20 million records .
CREATE FUNCTION [dbo].[Udf_getcleanedstring] (@s VARCHAR(255))
returns VARCHAR(255)
AS
BEGIN
DECLARE @o VARCHAR(255)
SET @o = Replace(@s, '/', '')
SET @o = Replace(@o, '-', '')
SET @o = Replace(@o, ';', '')
SET @o = Replace(@o, '"', '')
RETURN @o
END
source to share
Whichever method you use is probably worth adding
WHERE YourCol LIKE '%[/-;"]%'
Unless you suspect that a very large portion of the lines actually contain at least one of the characters that need to be removed.
As you use it in the statement UPDATE
, simply adding an attribute WITH SCHEMABINDING
can greatly improve things and allow the UPDATE to continue line by line rather than cache the entire operation in the coil first for Halloween Defense
Nested REPLACE
calls in TSQL are still slow, although they involve multiple passes through strings.
You can knock the CLR function like below (if you haven't worked with them before, it is very easy to deploy them from the SSDT project as long as the CLR is enabled on the server). The UPDATE plan for this does not contain a coil either.
The regex uses (?:)
to denote a non-capturing group with various interesting characters separated by an alternation character |
as /|-|;|\"
( "
needs to be escaped in a string literal, therefore preceded by a forward slash).
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
public partial class UserDefinedFunctions
{
private static readonly Regex regexStrip =
new Regex("(?:/|-|;|\")", RegexOptions.Compiled);
[SqlFunction]
public static SqlString StripChars(SqlString Input)
{
return Input.IsNull ? null : regexStrip.Replace((string)Input, "");
}
}
source to share
I want to show the huge performance differences between using with two USER DIFINED FUNCTIONS types:
- TABLE user function
- SCALAR user function
See example test:
use AdventureWorks2012
go
-- create table for the test
create table dbo.FindString (ColA int identity(1,1) not null primary key,ColB varchar(max) );
declare @text varchar(max) = 'A web server can handle a Hypertext Transfer Protocol request either by reading
a file from its file ; system based on the URL <> path or by handling the request using logic that is specific
to the type of resource. In the case that special logic is invoked the query string will be available to that logic
for use in its processing, along with the path component of the URL.';
-- init process in loop 1,000,000
insert into dbo.FindString(ColB)
select @text
go 1000000
-- use one of the scalar function from the answers which post in this thread
alter function [dbo].[udf_getCleanedString]
(
@s varchar(max)
)
returns varchar(max)
as
begin
return replace(replace(replace(replace(@s,'/',''),'-',''),';',''),'"','')
end
go
--
-- create from the function above new function an a table function ;
create function [dbo].[utf_getCleanedString]
(
@s varchar(255)
)
returns table
as return
(
select replace(replace(replace(replace(@s,'/',''),'-',''),';',''),'"','') as String
)
go
--
-- clearing the buffer cach
DBCC DROPCLEANBUFFERS ;
go
-- update process using USER TABLE FUNCTIO
update Dest with(rowlock) set
dest.ColB = D.String
from dbo.FindString dest
cross apply utf_getCleanedString(dest.ColB) as D
go
DBCC DROPCLEANBUFFERS ;
go
-- update process using USER SCALAR FUNCTION
update Dest with(rowlock) set
dest.ColB = dbo.udf_getCleanedString(dest.ColB)
from dbo.FindString dest
go
And this is the execution plan: Since you can see that UTF is much better than USF, they 2 do the same, replacing a string, but one returns a scalar and the other returns as a table
Another important parameter to view (SET STATISTICS IO ON;)
source to share
How to combine them into one call:
create function [dbo].[udf_getCleanedString]
(
@s varchar(255)
)
returns varchar(255)
as
begin
return replace(replace(replace(replace(@s,'/',''),'-',''),';',''),'"','')
end
Or you can do UPDATE
on the table itself first . Scalar functions are pretty slow.
source to share
Here is a similar question asked earlier, I like this approach mentioned here.
How do I replace multiple characters in SQL?
declare @badStrings table (item varchar(50))
INSERT INTO @badStrings(item)
SELECT '>' UNION ALL
SELECT '<' UNION ALL
SELECT '(' UNION ALL
SELECT ')' UNION ALL
SELECT '!' UNION ALL
SELECT '?' UNION ALL
SELECT '@'
declare @testString varchar(100), @newString varchar(100)
set @teststring = 'Juliet ro><0zs my s0x()rz!!?!one!@!@!@!'
set @newString = @testString
SELECT @newString = Replace(@newString, item, '') FROM @badStrings
select @newString -- returns 'Juliet ro0zs my s0xrzone'
source to share