Split Column Data and Insert - SQL Server Stored Procedures

I have a table with several hundred thousand rows, and the data format is index (int) and nvarchar (1000) words. A word string consists of a set of words separated by a space, for example word1 word2 word3

. I want to read a table of words and create a dictionary. In terms of pseudocode, this is what I want:

INSERT INTO dictionary (dictionaryword) 
SELECt splitBySpace(words) FROM word;

      

It's easy enough for Java or C # code, but I found it took a long time to process the data. In other processing, the cost of executing SQL to process the query (i.e. not processing data in C # or Java) is huge.

I want to create a stored procedure that reads words, breaks them up, and then creates a dictionary. I've seen various splitting routines that are a little tricky, like https://dba.stackexchange.com/questions/21078/t-sql-table-valued-function-to-split-a-column-on-commas , but I don't could figure out how to recode this for the task of reading a whole database by breaking words and inserting them.

Does anyone have some sample code for splitting column data and then inserting it that can be fully implemented in SQL for efficiency reasons?

+3


source to share


2 answers


Here is the solution.

DDL:

create table sentence(t varchar(100))

insert into sentence values
('Once upon a time in America'),
('Eyes wide shut')

      

DML:



select distinct ca.d as words from sentence s
cross apply(select split.a.value('.', 'varchar(100)') as d 
            from 
            (select cast('<x>' + REPLACE(s.t, ' ', '</x><x>') + '</x>' as xml) as d) as a 
             cross apply d.nodes ('/x') as split(a)) ca

      

Output:

words

a
America
Eyes
in
Once
shut
time
upon
wide

      

Fiddle http://sqlfiddle.com/#!6/54dff/4

+2


source


I suggest you use the stored procedure like this:

CREATE PROCEDURE spSplit 
    @words nvarchar(max),
    @delimiter varchar(1) = ' '
AS
BEGIN
    SET NOCOUNT ON;
    DECLARE @sql nvarchar(max)
    SELECT @sql = 'SELECT ''' + REPLACE(@words, @delimiter, ''' As res UNION ALL SELECT ''') + ''''
    --or for removing duplicates SELECT @sql = 'SELECT ''' + REPLACE(@words, @delimiter, ''' As res UNION SELECT ''') + ''''
    EXEC(@sql)
END
GO

      

This stored procedure will give you the results that you can use in the statement INSERT INTO

, and:



CREATE PROCEDURE spSplit 
    @words nvarchar(max) = 'a bc lkj weu 234 , sdsd 3 and 3 & test',
    @delimiter varchar(1) = ' ',
    @destTable nvarchar(255), 
    @destColumn nvarchar(255)
AS
BEGIN
    SET NOCOUNT ON;
    DECLARE @sql nvarchar(max)
    SELECT @sql = 'INSERT INTO [' + @destTable + '] ([' + @destColumn + ']) SELECT res FROM ('
    SELECT @sql = @sql + 'SELECT ''' + REPLACE(@words, @delimiter, ''' As res UNION ALL SELECT ''') + ''''
    SELECT @sql = @sql + ') DT WHERE res NOT IN (SELECT [' + @destColumn + '] FROM [' + @destTable + '])'
    EXEC(@sql)
END
GO

      

This stored procedure will do inserts without inserting duplicates.

0


source







All Articles