BULK INSERT with two row delimiters

I'm trying to import a text file, so the result will be just words on separate lines of the same column. For example, the text:

'Hi, Mom,

meet again

should contain 5 entries:

'Hello' 
'Mom,'
'we' 
'meet' 
'again'

      

I tried to accomplish this with BULK INSERT

with ROWTERMINATOR = ' '

, but there is a problem handling new line

both terminator

and I am getting 'Mom,we'

in one of the results.

From what I know, there is no way to add second ROWTEMRMINATOR

to BULK INSERT

(true?). What is the best way to achieve the result as above?

The file cannot be preprocessed outside of SQL Server and this method should be useful for hundreds of files with thausands strings of words imported at different times, not just once.

+3


source to share


1 answer


Given:

File cannot be preprocessed outside of SQL Server

Option 1

Why not use OPENROWSET (BULK ...) ? This will allow you to import / insert (which takes care of the line terminator) and at the same time splits (which takes care of the field terminator). Depending on whether a Format File can be created , it should look something like this:

File format = split each line

INSERT INTO dbo.TableName (ColumnName)
  SELECT split.SplitVal
  FROM   OPENROWSET(BULK 'path\to\file.txt',
                    FORMATFILE='Path\to\FormatFile.XML') data(eachrows)
  CROSS APPLY SQL#.String_Split(data.eachrow, N' ', 2) split;

      

No format file = split whole file as one line



INSERT INTO dbo.TableName (ColumnName)
  SELECT split.SplitVal
  FROM   OPENROWSET(BULK 'path\to\file.txt', SINGLE_CLOB) data(allrows)
  CROSS APPLY SQL#.String_Split(
                                REPLACE(data.allrows, NCHAR(10), N' '),
                                N' ',
                                2 -- remove empty entries
                               ) split;

      

Notes:

  • For both methods, you need to use a line separator. SQLCLR based splitters are the fastest and in the examples above, I used one from the SQL # library (which I created, but the feature String_Split

    is available in the free version). You can also write your own. If you're writing your own and aren't using a format file, it might be a good idea to allow multiple delimiters so you can go to "and" \ n "and get rid of REPLACE()

    .

  • If you can write your own SQLCLR line separator, then it would be even better to just write an SQLCLR stored procedure that takes an input parameter for @FilePath

    , reads a file, does splitting, and spits out words as many rows of the same column:

    INSERT INTO dbo.TableName(ColumnName)
      EXEC dbo.MySQLCLRproc(N'C:\path\to\file.txt');
    
          

  • If you are not using (or cannot use) a format file then be sure to use the correct "SINGLE_" option, as you can do either SINGLE_CLOB

    (returns VARCHAR(MAX)

    for a standard ASCII file) or SINGLE_NCLOB

    (returns a file NVARCHAR(MAX)

    for a Unicode file).

  • Even if you can create a format file, it might be more efficient to pull the entire file as one line depending on the size of the files, since splitting a large line can be done quite quickly and would be one function call, whereas a file of thousands of short lines would be thousands function calls, which are also fast, but probably not 1000 times faster than a single call. But if the file is 1MB or more, I probably still prefer to make a format file and handle so many short lines.

Option 2

If by "preprocessed" you mean change, but there is no limit to simply reading and inserting data from something external in SQL Server, you should write a small .NET application that reads rows, splits and inserts data by calling Stored a procedure that takes a table parameter (TVP). I have detailed this approach in another answer here on SO:

How can I insert 10 million records in no time?

This can be compiled as a console application and used in scripts (i.e. .CMD / .BAT ) and even scheduled as a Windows task.

+2


source







All Articles