Parsing text for multiple columns

I have a feed that populates one text box in a table with statistics. I need to transfer this data to several fields in another table, but the strange format makes it difficult to import automatically.

The file format is flat text, but below is an example:

08:34:52 Checksum=180957248,TicketType=6,InitialUserType=G,InitialUserID=520,CommunicationType=Incoming,Date=26-03-2012,Time=08:35:00,Service=ST,Duration=00:00:14,Cost=0.12

      

Effectively, it consists of:

[timestamp] [Field1 name]=[Field1 value],[Field2 name]=[Field2 value],[Field4 name]=[Field4 value]...[CR]

      

All fields are always in the same order, but not always present . There can be from 5 to 30 columns in total.

I tried the function below to translate it, which seems to work mostly, but seems to accidentally skip fields:

Data analysis:

(SELECT [Data].[dbo].[GetFromTextString] ( 'Checksum=' ,',' ,RAWTEXT)) AS RowCheckSum,
(SELECT [Data].[dbo].[GetFromTextString] ( 'TicketType=' ,',' ,RAWTEXT)) AS TicketType,

      

And the function:

CREATE FUNCTION [dbo].[GetFromTextString]
-- Input start and end and return value.
   (@uniqueprefix VARCHAR(100),
    @commonsuffix VARCHAR(100),
    @datastring VARCHAR(MAX) )
RETURNS VARCHAR(MAX) -- Picked Value.
AS
BEGIN

    DECLARE @ADJLEN INT = LEN(@uniqueprefix)

    SET @datastring = @datastring + @commonsuffix

   RETURN ( 
    CASE WHEN (CHARINDEX(@uniqueprefix,@datastring) > 0) 
         AND (CHARINDEX(@uniqueprefix + @commonsuffix,@datastring) = 0)
    THEN SUBSTRING(@datastring, PATINDEX('%' + @uniqueprefix + '%',@datastring)+@ADJLEN, CHARINDEX(@commonsuffix,@datastring,PATINDEX('%' + @uniqueprefix + '%',@datastring))- PATINDEX('%' + @uniqueprefix + '%',@datastring)-@ADJLEN) ELSE NULL END
)
END

      

Can anyone suggest a better / cleaner way to trim the data, or can someone understand why this formula is skipping rows?

Any help is really appreciated.

+3


source to share


1 answer


NOTE. FIRST SOLUTION - RESIBER. I AM LEFT IN THE STORY OF HISTORICAL REASONS BUT THE BEST SOLUTION IS BELOW

I'm not even sure if it will be faster than your current method, but I approach the problem the same way (if I was forced into a SQL solution). The first thing that is required is a table-valued function that will perform the shared function:

CREATE FUNCTION dbo.Split (@TextToSplit VARCHAR(MAX), @Delimiter VARCHAR(MAX))
RETURNS @Values TABLE (Position INT IDENTITY(1, 1) NOT NULL, TextValues VARCHAR(MAX) NOT NULL)
AS
BEGIN
    WHILE CHARINDEX(@Delimiter, @TextToSplit) > 0
        BEGIN
            INSERT @Values 
            SELECT  LEFT(@TextToSplit, CHARINDEX(@Delimiter, @TextToSplit) - 1)
            SET @TextToSplit = SUBSTRING(@TextToSplit, CHARINDEX(@Delimiter, @TextToSplit) + 1, LEN(@TextToSplit))

        END
        INSERT @Values VALUES (@TextToSplit) 
    RETURN
END

      

In my example I am working from a @Worklist temp table, you may need to adapt your file accordingly, or you can just insert the relevant data into @Worklist where I used dummy data:

DECLARE @WorkList TABLE (ID INT IDENTITY(1, 1) NOT NULL, TextField VARCHAR(MAX))
INSERT @WorkList
SELECT  '08:34:52 Checksum=180957248,TicketType=6,InitialUserType=G,InitialUserID=520,CommunicationType=Incoming,Date=26-03-2012,Time=08:35:00,Service=ST,Duration=00:00:14,Cost=0.12'
UNION
SELECT  '08:34:52 Checksum=180957249,TicketType=5,InitialUserType=H,InitialUserID=521,CommunicationType=Outgoing,Date=27-03-2012,Time=14:27:00,Service=ST,Duration=00:15:12,Cost=0.37'

      

The main bit of the request is done here. It's quite long, so I tried to comment on it as soon as possible. If further clarification is required, I can add additional comments.

DECLARE @Output TABLE (ID INT IDENTITY(1, 1) NOT NULL, TextField VARCHAR(MAX))
DECLARE @KeyPairs TABLE (WorkListID INT NOT NULL, KeyField VARCHAR(MAX), ValueField VARCHAR(MAX))

-- STORE TIMESTAMP DATA - THIS ASSUMES THE FIRST SPACE IS THE END OF THE TIMESTAMP
INSERT @KeyPairs 
SELECT  ID, 'TimeStamp', LEFT(TextField, CHARINDEX(' ', TextField))
FROM    @WorkList

-- CLEAR THE TIMESTAMP FROM THE WORKLIST
UPDATE  @WorkList
SET     TextField = SUBSTRING(TextField, CHARINDEX(' ', TextField) + 1, LEN(TextField))

DECLARE @ID INT = (SELECT MIN(ID) FROM @WorkList)
WHILE @ID IS NOT NULL 
    BEGIN
        -- SPLIT THE STRING FIRST INTO ALL THE PAIRS (e.g. Checksum=180957248)
        INSERT @Output
        SELECT  TextValues
        FROM    dbo.Split((SELECT TextField FROM @WorkList WHERE ID = @ID), ',')

        DECLARE @ID2 INT = (SELECT MIN(ID) FROM @Output)

        -- FOR ALL THE PAIRS SPLIT THEM INTO A KEY AND A VALUE (USING THE POSITION OF THE SPLIT FUNCTION)
        WHILE @ID2 IS NOT NULL
            BEGIN
                INSERT @KeyPairs
                SELECT  @ID, 
                        MAX(CASE WHEN Position = 1 THEN TextValues ELSE '' END),
                        MAX(CASE WHEN Position = 2 THEN TextValues ELSE '' END)
                FROM    dbo.Split((SELECT TextField FROM @Output WHERE ID = @ID2), '=')

                DELETE  @Output
                WHERE   ID = @ID2

                SET @ID2 = (SELECT MIN(ID) FROM @Output)
            END

        DELETE  @WorkList
        WHERE   ID = @ID

        SET @ID = (SELECT MIN(ID) FROM @WorkList)
    END

-- WE NOW HAVE A TABLE CONTAINING EAV MODEL STYLE DATA. THIS NEEDS TO BE PIVOTED INTO THE CORRECT FORMAT
-- ENSURE COLUMNS ARE LISTED IN THE ORDER YOU WANT THEM TO APPEAR
SELECT  *
FROM    @KeyPairs p
        PIVOT
        (   MAX(ValueField)
            FOR KeyField IN 
                (   [TimeStamp], [Checksum], [TicketType], [InitialUserType], 
                    [InitialUserID], [CommunicationType], [Date], [Time],
                    [Service], [Duration], [Cost]
                )
        ) AS PivotTable;

      




EDIT (4 YEARS LATER)

A recent upvote brought this to my attention, and I hate myself a little forever by posting this answer in its current form.

Much better split function:

CREATE FUNCTION dbo.Split
(
   @List       NVARCHAR(MAX),
   @Delimiter  NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
(   WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1), (1)) n (N)),
    N2(N) AS (SELECT 1 FROM N1 a CROSS JOIN N1 b),
    N3(N) AS (SELECT 1 FROM N2 a CROSS JOIN N2 b),
    N4(N) AS (SELECT 1 FROM N3 a CROSS JOIN N3 b),
    cteTally(N) AS 
    (   SELECT 0 UNION ALL 
        SELECT TOP (DATALENGTH(ISNULL(@List,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) 
        FROM n4
    ),
    cteStart(N1) AS 
    (   SELECT t.N+1 
        FROM cteTally t
        WHERE (SUBSTRING(@List,t.N,1) = @Delimiter OR t.N = 0)
    )
    SELECT Item = SUBSTRING(@List, s.N1, ISNULL(NULLIF(CHARINDEX(@Delimiter,@List,s.N1),0)-s.N1,8000)), 
            Position = s.N1,
            ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1)
    FROM cteStart s
);

      

Then there is no need for a loop at all, you only have a suitable set based solution by calling split functions twice to get an EAV style dataset:

DECLARE @WorkList TABLE (ID INT IDENTITY(1, 1) NOT NULL, TextField VARCHAR(MAX))
INSERT @WorkList
SELECT  '08:34:52 Checksum=180957248,TicketType=6,InitialUserType=G,InitialUserID=520,CommunicationType=Incoming,Date=26-03-2012,Time=08:35:00,Service=ST,Duration=00:00:14,Cost=0.12'
UNION
SELECT  '08:34:52 Checksum=180957249,TicketType=5,InitialUserType=H,InitialUserID=521,CommunicationType=Outgoing,Date=27-03-2012,Time=14:27:00,Service=ST,Duration=00:15:12,Cost=0.37';

WITH KeyPairs AS
(   SELECT  w.ID, 
            [Timestamp] = LEFT(w.TextField, CHARINDEX(' ', w.TextField)),
            KeyField = MAX(CASE WHEN v.ItemNumber = 1 THEN v.Item END),
            ValueField = MAX(CASE WHEN v.ItemNumber = 2 THEN v.Item END)
    FROM    @WorkList AS w
            CROSS APPLY dbo.Split(SUBSTRING(TextField, CHARINDEX(' ', TextField) + 1, LEN(TextField)), ',') AS kp
            CROSS APPLY dbo.Split(kp.Item, '=') AS v
    GROUP BY w.ID, kp.ItemNumber,w.TextField
)
SELECT  *
FROM   KeyPairs AS kp
        PIVOT
        (   MAX(ValueField)
            FOR KeyField IN 
                (   [Checksum], [TicketType], [InitialUserType], 
                    [InitialUserID], [CommunicationType], [Date], [Time],
                    [Service], [Duration], [Cost]
                )
        ) AS pvt;

      

+3


source







All Articles