PowerShell Import-CSV to Export-CSV with Type Conversion
I am getting a bunch of tab delimited files and I want to convert them to comma delimited files. I also only need two columns from the file Date and Value1, so I leave the rest of the columns.
Date Value1 Value2 Value3 Value4 Sensor
08.07.2010 115,28 115,45 115,45 115,28 100
07.07.2010 115,34 115,32 115,34 115,25 85
06.07.2010 115,23 115,74 115,74 115,20 203
It works like this:
(get-ChildItem -Path '*.txt').name | ForEach-Object {
Import-Csv -Path $_ -Delimiter "`t" |
Select-Object -Property Date,Value1 |
Export-Csv "out\$_"
}
Unfortunately, the source file uses a comma as a decimal separator . So I get these files where Value1 is interpreted as a string:
"Date","Value1"
"24.02.2017","30,18"
"23.02.2017","30,20"
"22.02.2017","30,18"
"21.02.2017","30,18"
"20.02.2017","30,17"
How do I set the datatype of a column to numeric? During Import-CSV
?
source to share
Export-CSV
always adding "..."
around the output field values ββis not (in itself) a problem:
-
The CSV data format has no built-in concept of data type , and double-quote is used solely to enclose a field value (which is a syntactic requirement for field values ββwith embedded characters
,
, for example). -
Excel, for example, infers the data type of a column solely from its content , whether that content is enclosed in double quotes or not.
- However, the interpretation of content is culturally related (language skills) .
-
In contrast, PowerShell
Import-Csv
never interprets the data and returns all field values ββas strings .
Depending on the active culture in the environment the CSV will be processed, you have two options:
-
For a culture such as
de-DE
(Germany), where,
(comma) rather than.
(period, period) is used as a decimal place;
, rather than,
used as a list separator:- When using target culture (check with
Get-Culture
) useExport-Csv -UseCulture
(-UseCulture
available since (at least) v2 [1] ) Thank you LotPings - Alternatively, use something like
Export-Csv -Delimiter ([cultureinfo]::GetCultureInfo('de-DE').TextInfo.ListSeparator)
- When using target culture (check with
-
For an English speaking culture (or any other culture that uses
.
a decimal point):- Custom process and replace instances
,
in numeric fields.
- see below.
- Custom process and replace instances
To replace instances ,
with .
, the following command, which only uses inline manipulation, will do:
Get-ChildItem -Path *.txt | ForEach-Object {
Get-Content -LiteralPath $_.FullName | ForEach-Object {
($_ -split '\t')[0,1] -replace ',', '.' -join ','
} |
Set-Content -Encoding utf8 "out/$($_.Name)"
}
Note. For simplicity's sake, the command assumes that the column values Date
do not contain instances, ,
and that none of the values ββrequire "..."
-enclosing - which is reasonable in this particular scenario.
- I chose UTF-8 as the output encoding above because it
Set-Content
uses the legacy, culture-specific ANSI code page by default. Adjust if necessary.
[1] Generally, if a parameter description in the cmdlet help topic does not mention the specific version in which it was introduced, it means that it has been since v2. You can now browse older versions of the documentation - up to v3 - on GitHub - just type T
or click Find file
and start typing the name of the cmdlet / conceptual help.
source to share
Export-CSV
adds quotes regardless of type. Probably the only way to do this is to download the file again and use a regex to remove those quotes:
(get-ChildItem -Path '*.txt').name | ForEach-Object {
Import-Csv -Path $_ -Delimiter "`t" |
Select-Object -Property Date,Value1 |
Export-Csv "out\$_"
(Get-Content $_) -replace ',"(\d+),(\d+)"$', ',$1.$2' | Set-Content "out\$_"
}
Note. ... You can set -Encoding
for the cmdlet Set-Content
!
source to share