PowerShell Import-CSV to Export-CSV with Type Conversion

I am getting a bunch of tab delimited files and I want to convert them to comma delimited files. I also only need two columns from the file Date and Value1, so I leave the rest of the columns.

Date    Value1  Value2  Value3  Value4  Sensor
08.07.2010  115,28  115,45  115,45  115,28  100
07.07.2010  115,34  115,32  115,34  115,25  85
06.07.2010  115,23  115,74  115,74  115,20  203

      

It works like this:

(get-ChildItem -Path '*.txt').name | ForEach-Object { 
    Import-Csv -Path $_ -Delimiter "`t" | 
    Select-Object -Property Date,Value1 | 
    Export-Csv "out\$_" 
}

      

Unfortunately, the source file uses a comma as a decimal separator . So I get these files where Value1 is interpreted as a string:

"Date","Value1"
"24.02.2017","30,18"
"23.02.2017","30,20"
"22.02.2017","30,18"
"21.02.2017","30,18"
"20.02.2017","30,17"

      

How do I set the datatype of a column to numeric? During Import-CSV

?

+3


source to share


2 answers


Export-CSV

always adding "..."

around the output field values ​​is not (in itself) a problem:

  • The CSV data format has no built-in concept of data type , and double-quote is used solely to enclose a field value (which is a syntactic requirement for field values ​​with embedded characters ,

    , for example).

  • Excel, for example, infers the data type of a column solely from its content , whether that content is enclosed in double quotes or not.

    • However, the interpretation of content is culturally related (language skills) .
  • In contrast, PowerShell Import-Csv

    never interprets the data and returns all field values ​​as strings .

Depending on the active culture in the environment the CSV will be processed, you have two options:

  • For a culture such as de-DE

    (Germany), where ,

    (comma) rather than .

    (period, period) is used as a decimal place ;

    , rather than ,

    used as a list separator:

    • When using target culture (check with Get-Culture

      ) use Export-Csv -UseCulture


      ( -UseCulture

      available since (at least) v2 [1] ) Thank you LotPings
    • Alternatively, use something like Export-Csv -Delimiter ([cultureinfo]::GetCultureInfo('de-DE').TextInfo.ListSeparator)

  • For an English speaking culture (or any other culture that uses .

    a decimal point):

    • Custom process and replace instances ,

      in numeric fields.

      - see below.

To replace instances ,

with .

, the following command, which only uses inline manipulation, will do:



Get-ChildItem -Path *.txt | ForEach-Object {
  Get-Content -LiteralPath $_.FullName | ForEach-Object {
    ($_ -split '\t')[0,1] -replace ',', '.' -join ','
  } |
    Set-Content -Encoding utf8 "out/$($_.Name)"
}

      

Note. For simplicity's sake, the command assumes that the column values Date

do not contain instances, ,

and that none of the values ​​require "..."

-enclosing - which is reasonable in this particular scenario.

  • I chose UTF-8 as the output encoding above because it Set-Content

    uses the legacy, culture-specific ANSI code page by default. Adjust if necessary.

[1] Generally, if a parameter description in the cmdlet help topic does not mention the specific version in which it was introduced, it means that it has been since v2. You can now browse older versions of the documentation - up to v3 - on GitHub - just type T

or click Find file

and start typing the name of the cmdlet / conceptual help.

+2


source


Export-CSV

adds quotes regardless of type. Probably the only way to do this is to download the file again and use a regex to remove those quotes:

(get-ChildItem -Path '*.txt').name | ForEach-Object { 
    Import-Csv -Path $_ -Delimiter "`t" | 
    Select-Object -Property Date,Value1 | 
    Export-Csv "out\$_" 
    (Get-Content $_) -replace ',"(\d+),(\d+)"$', ',$1.$2'  | Set-Content "out\$_"
}

      



Note. ... You can set -Encoding

for the cmdlet Set-Content

!

0


source







All Articles