How does a Windows batch file correctly read data from a delimited text file when some fields are null?

I have a comma delimited text file with three fields. The first always contains a string, but the second, third, or both can be empty. When everything contains strings, when only the third is emppty, and when the second and third are empty, I get the expected result when read with the FOR command, the expected result is that the variables read from the fields containing the strings are equal to these lines, and variables read from empty fields have zero values. However, when the second fielkd is empty and the third field contains a string, I get the unexpected result that the second variable that should have been read from the second field is equal to the content of the third field, and the third variable is null.

How can I get around this problem?

+3


source to share


2 answers


This information is copied in a text editor from my DosTips post: Safely Parse Almost Any CSV with parseCSV.bat

Quite often it happens that someone wants to parse CSV using FOR / F. This is an easy task if you know that all columns are full and there are no commas, newlines, or quotes in the values. Let's assume there are 4 columns:

@echo off
for /f "tokens=1-4 delims=," %%A in (test.csv) do (
  echo ----------------------
  echo A=%%~A
  echo B=%%~B
  echo C=%%~C
  echo D=%%~D
  echo(
)

      

But it becomes more difficult if one of the following conditions occurs:

1) Values ​​can be empty with sequential commas. FOR / F treats consecutive delimiters as one, so it will discard the column assignment.

2) Quoted values ​​can contain commas. FOR / F will incorrectly handle the quoted comma as a column separator.

3) Quoted values ​​can contain newlines. FOR / F breaks a line on a new line and incorrectly treats one line as two.

4) Quoted values ​​can contain double quotes that represent a single quote.
Eg "He said, ""Hello there""

. To convert ""

to, "

you need a method.

Then there are secondary issues that can occur if slow expansion is enabled.

5) Variable FOR% %% A will be corrupted if it contains !

(or sometimes ^

) if delayed expansion is enabled when the variable is expanded.

There are fairly simple solutions to some of these problems, but solving all of them is extremely difficult (and slow) with a clean batch.

I wrote a hybrid JScript / batch utility called parseCSV.bat that makes it easy and relatively efficient to parse almost any CSV file correctly using FOR / F.

parseCSV.bat

@if (@X)==(@Y) @end /* harmless hybrid line that begins a JScrpt comment

::************ Documentation ***********
::parseCSV.bat version 1.0
:::
:::parseCSV  [/option]...
:::
:::  Parse stdin as CSV and write it to stdout in a way that can be safely
:::  parsed by FOR /F. All columns will be enclosed by quotes so that empty
:::  columns may be preserved. It also supports delimiters, newlines, and
:::  quotes within quoted values. Two consecutive quotes within a quoted value
:::  are converted into one quote.
:::
:::  Available options:
:::
:::    /I:string = Input delimiter. Default is a comma.
:::
:::    /O:string = Output delimiter. Default is a comma.
:::
:::    /E = Encode output delimiter in value as \D
:::         Encode newline in value as \N
:::         Encode backslash in value as \S
:::
:::    /D = Escape exclamation point and caret for delayed expansion
:::         ! becomes ^!
:::         ^ becomes ^^
:::
:::parseCSV  /?
:::
:::  Display this help
:::
:::parseCSV  /V
:::
:::  Display the version of parseCSV.bat
:::
:::parseCSV.bat was written by Dave Benham. Updates are available at the original
:::posting site: http://www.dostips.com/forum/viewtopic.php?f=3&t=5702
:::

::************ Batch portion ***********
@echo off
if "%~1" equ "/?" (
  setlocal disableDelayedExpansion
  for /f "delims=: tokens=*" %%A in ('findstr "^:::" "%~f0"') do echo(%%A
  exit /b 0
)
if /i "%~1" equ "/V" (
  for /f "delims=:" %%A in ('findstr /bc:"::%~nx0 version " "%~f0"') do echo %%A
  exit /b 0
)
cscript //E:JScript //nologo "%~f0" %*
exit /b 0


************ JScript portion ***********/
var args     = WScript.Arguments.Named,
    stdin    = WScript.Stdin,
    stdout   = WScript.Stdout,
    escape   = args.Exists("E"),
    delayed  = args.Exists("D"),
    inDelim  = args.Exists("I") ? args.Item("I") : ",",
    outDelim = args.Exists("O") ? args.Item("O") : ",",
    quote    = false,
    ln, c, n;
while (!stdin.AtEndOfStream) {
  ln=stdin.ReadLine();
  if (!quote) stdout.Write('"');
  for (n=0; n<ln.length; n++ ) {
    c=ln.charAt(n);
    if (c == '"') {
      if (quote && ln.charAt(n+1) == '"') {
        n++;
      } else {
        quote=!quote;
        continue;
      }
    }
    if (c == inDelim && !quote) c='"'+outDelim+'"';
    if (escape) {
      if (c == outDelim) c="\\D";
      if (c == "\\") c="\\S";
    }
    if (delayed) {
      if (c == "!") c="^!";
      if (c == "^") c="^^";
    }
    stdout.Write(c);
  }
  stdout.Write( (quote) ? ((escape) ? "\\N" : "\n") : '"\n' );
}

      

I also wrote a script that defines a macro to help parse the most problematic CSV files. See http://www.dostips.com/forum/viewtopic.php?f=3&t=1827 for background information on batch macros with arguments.

define_csvGetCol.bat

::define_csvGetCol.bat version 1.1
::
:: Defines variable LF and macro csvGetCol to be used with
:: parseCSV.bat to parse nearly any CSV file.
::
:: This script must be called with delayedExpansion disabled.
::
:: The %csvGetCol% macro must be used with delayedExpansion enabled.
::
:: Example usage:
::
::   @echo off
::   setlocal disableDelayedExpansion
::   call define_csvGetCol
::   setlocal enableDelayedExpansion
::   for /f "tokens=1-3 delims=," %%A in ('parseCSV /d /e ^<test.csv') do (
::     %== Load and decode column values ==%
::     %csvGetCol% A "," %%A
::     %csvGetCol% B "," %%B
::     %csvGetCol% C "," %%C
::     %== Display the result ==%
::     echo ----------------------
::     for %%V in (A B C) do echo %%V=!%%V!
::     echo(
::   )
::
:: Written by Dave Benham
::

:: Delayed expansion must be disabled during macro definition

:: Define LF to contain a linefeed (0x0A) character
set ^"LF=^

^" The empty line above is critical - DO NOT REMOVE

:: define a newline with line continuation
set ^"\n=^^^%LF%%LF%^%LF%%LF%^^"

:: Define csvGetCol
:: %csvGetCol%  envVarName  "Delimiter"  FORvar
set csvGetCol=for %%# in (1 2) do if %%#==2 (%\n%
setlocal enableDelayedExpansion^&for /f "tokens=1,2*" %%1 in ("!args!") do (%\n%
  endlocal^&endlocal%\n%
  set "%%1=%%~3"!%\n%
  if defined %%1 (%\n%
    for %%L in ("!LF!") do set "%%1=!%%1:\N=%%~L!"%\n%
    set "%%1=!%%1:\D=%%~2!"%\n%
    set "%%1=!%%1:\S=\!"%\n%
  )%\n%
)) else setlocal disableDelayedExpansion ^& set args=

      


Usage is extremely easy if you know that there are no commas or newlines in any values, and deferred expansion is not required:

test1.csv

"value1 with ""quotes""",value2: No problem!,value3: 2^3=8,value4: (2^2)!=16
value1,,value3,value4
value1,,,value4
value1,,,
,,,value4

      

test1.bat - no expansion delay, no commas or newlines in values

@echo off
for /f "tokens=1-4 delims=," %%A in ('parseCSV ^<test1.csv') do (
  echo -------------
  echo(A=%%~A
  echo(B=%%~B
  echo(C=%%~C
  echo(D=%%~D
  echo(
)

      

- OUTPUT 1 -



-------------
A=value1 with "quotes"
B=value2: No problem!
C=value3: 2^3=8
D=value4: (2^2)!=16

-------------
A=value1
B=
C=value3
D=value4

-------------
A=value1
B=
C=
D=value4

-------------
A=value1
B=
C=
D=

-------------
A=
B=
C=
D=value4

      


It's also pretty easy when commas are in values, if you know a character that doesn't exist in any meaning. Just provide a unique character for the output delimiter.

test2.csv

"value1 with ""quotes""","value2, No problem!","value3, 2^3=8","value4, (2^2)!=16"
value1,,value3,value4
value1,,,value4
value1,,,
,,,value4

      

test2.bat - no delay with expansion, no strings or lines in values. Note that the entire option must be specified if the delimiter is a poison character

@echo off
for /f "tokens=1-4 delims=|" %%A in ('parseCSV "/o:|" ^<test2.csv') do (
  echo -------------
  echo(A=%%~A
  echo(B=%%~B
  echo(C=%%~C
  echo(D=%%~D
  echo(
)

      

- OUTPUT2 -

-------------
A=value1 with "quotes"
B=value2, No problem!
C=value3, 2^3=8
D=value4, (2^2)!=16

-------------
A=value1
B=
C=value3
D=value4

-------------
A=value1
B=
C=
D=value4

-------------
A=value1
B=
C=
D=

-------------
A=
B=
C=
D=value4

      


It takes a little more code if the values ​​can contain newlines or if you don't know the character that doesn't appear in any value. This solution encodes newlines, separators and forward slashes as \N

, \D

and \S

. Delayed expansion is necessary in the loop to decode the values, !

and therefore ^

must be escaped like ^!

and ^^

.

test3.csv

"2^3=8","(2^2)!=16","Success!",Value4
value1,value2,value3,value4
,,,value4
"value1","value2","value3","value4"
"He said, ""Hey cutie.""","She said, ""Drop dead!""","value3 line1
value3 line2",c:\Windows

      

test3.bat - Allow almost any valid CSV without using a macro.

@echo off
setlocal enableDelayedExpansion

:: Define LF to contain a linefeed (0x0A) character
set ^"LF=^

^" The empty line above is critical - DO NOT REMOVE

for /f "tokens=1-4 delims=," %%A in ('parseCSV /e /d ^<test3.csv') do (
  %== Load columns with encoded values. The trailing ! is important ==%
  set "A=%%~A"!
  set "B=%%~B"!
  set "C=%%~C"!
  set "D=%%~D"!
  %== Decode values ==%
  for %%L in ("!LF!") do for %%V in (A B C D) do if defined %%V (
    set "%%V=!%%V:\N=%%~L!"
    set "%%V=!%%V:\D=,!"
    set "%%V=!%%V:\S=\!"
  )
  %== Print results ==%
  echo ---------------------
  for %%V in (A B C D) do echo(%%V=!%%V!
  echo(
)

      

- OUTPUT3 -

---------------------
A=2^3=8
B=(2^2)!=16
C=Success!
D=Value4

---------------------
A=value1
B=value2
C=value3
D=value4

---------------------
A=
B=
C=
D=value4

---------------------
A=value1
B=value2
C=value3
D=value4

---------------------
A=He said, "Hey cutie."
B=She said, "Drop dead!"
C=value3 line1
value3 line2
D=c:\Windows

      


test4.bat - Allow almost any valid CSV, but now use a macro %csvGetCol%

.

@echo off

:: Delayed expansion must be disabled during macro definition
setlocal disableDelayedExpansion
call define_csvGetCol

:: Delayed expansion must be enabled when using %csvGetCol%
setlocal enableDelayedExpansion
for /f "tokens=1-4 delims=," %%A in ('parseCSV /e /d ^<test3.csv') do (
  %== Load and decode column values ==%
  %csvGetCol% A "," %%A
  %csvGetCol% B "," %%B
  %csvGetCol% C "," %%C
  %csvGetCol% D "," %%D
  %== Print results ==%
  echo ---------------------
  for %%V in (A B C D) do echo(%%V=!%%V!
  echo(
)

      

The output is identical to test3.bat


If the CSV file is very large, it is much more efficient to save the output of parseCSV.bat to a temporary file and then use a FOR / F loop to read the temporary file.


There are still a couple of limitations inherent in all FOR / F:

1) Single FOR / F cannot parse more than 32 columns.

2) Limiting the string length to 8191 characters can be a problem.

+2


source


No sample data, so the solution is incomplete.

@ECHO OFF
SETLOCAL enabledelayedexpansion
(
 FOR /f "delims=" %%a IN (q27830845.txt) DO (
  SET "line=%%a"
  SET "line=!line:,,,= , , ,!"
  SET "line=!line:,,= , ,!"
  FOR /f "tokens=1-4delims=," %%b IN ("!LINE!") DO (
   ECHO(%%a--^>^>%%b++%%c++%%d++%%e++
  )
 )
)>newfile.txt

      

GOTO: EOF

I used a file named q27830845.txt

containing this data for my testing.



col1,col 2,col 3,col4
one,two,three,four
ONE,,THREE,FOUR - no two
ONE,,,FOUR - 3 and 2 missing
,,,Only FOUR

      

Creates a file newfile.txt with content

col1,col 2,col 3,col4-->>col1++col 2++col 3++col4++
one,two,three,four-->>one++two++three++four++
ONE,,THREE,FOUR - no two-->>ONE ++ ++THREE++FOUR - no two++
ONE,,,FOUR - 3 and 2 missing-->>ONE ++ ++ ++FOUR - 3 and 2 missing++
,,,Only FOUR-->> ++ ++ ++Only FOUR++

      

Please note that %%a

etc. can be added Space. Undoubtedly, there will be sensitivity to symbols that have meaning cmd

, such as !

and %

. ++

just used as an obvious visual separator between fields.

+1


source







All Articles