Can I replace Get-Content, ForEach-Object string -match with Select-String cmdlet?
I have a fixed width file with entries in the format as follows
DDEDM2018890 19960730015000010000
DDETPL015000 20150515015005010000
DDETPL015010 20150515015003010000
DDETPL015020 20150515015002010000
DDETPL015030 20150515015005010000
DDETPL015040 20150515015000010000
the first 3 characters identify the record type, in the above example all records are of type DDE
, but the file also contains lines of a different type.
the following regex with named capture groups parses the relevant information from each record for my purpose (note that it filters down to DDE
record types as well :
DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})
play with this regex for this great online parser
I wrote the script, which uses the cmdlets Get-Content
, ForEach-Object
and Select-Object
to convert a fixed-width file to a csv file.
I wonder if I can replace the cmdlets Get-Content
with a ForEach-Object
single cmdlet Select-String
?
#this powershell script reads fixed width file and generates a csv file of the relevant & converted values #Prepare HashSet object for Select-Object to convert CategoryCode and append with CategoryId $Category = @{ Name = "Category" Expression = { $cat = switch($_.CategoryCode) { "50"{"A"} "54"{"C"} "60"{"F"} "66"{"I"} "74"{"M"} "88"{"T"} } $cat+$_.CategoryId } } gc "C:\Path\To\File.txt" | % { if($_ -match "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3}).*$") { #$matches is a hashset of named capture groups, convert to object to allow Select-Object to handle hashset elements as object properties [PSCustomObject]$matches } } | select Database, $Category, Length #| export-csv "AnalysisLengths.csv" -NoTypeInformation
Before I finished the script, I tried to use the cmdlet Select-String
but could not figure out how to use it, I believe it can achieve the same result in a more eloquent way ... this is what I had:
##Could this be completed with just the Select-String commandlet instead of Get-Content+ForEach+Select-Object? Select-String -Path "C:\Path\To\File.txt" ` -Pattern "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})" ` | Select-Object -ExpandProperty Matches
The use is -ExpandProperty
to convert the property Microsoft.PowerShell.Commands.MatchInfo
Matches
to actual objects System.Text.RegularExpressions.Match
for each row ...
see also Powershell Select-Object vs ForEach in Select-String results
source to share
As long as Select-String
can combine Get-Content
and match patterns, you still need a loop to create your custom objects. You can stick with what you have, although I would suggest a couple of modifications. Replace the operator with a switch
hash table and create a nested if
a Where-Object
filter:
$categories = @{
'50' = 'A'
'54' = 'C'
'60' = 'F'
'66' = 'I'
'74' = 'M'
'88' = 'T'
}
$category = @{
Name = 'Category'
Expression = { $categories[$_.CategoryCode] + $_.CategoryId }
}
$pattern = 'DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})'
Get-Content 'C:\path\to\file.txt' |
? { $_ -match $pattern } |
% { [PSCustomObject]$matches } |
select Database, $category, Length |
Export-Csv 'C:\path\to\output.csv' -NoType
Or you can go with @JPBlanc's suggestion (again with a few minor changes):
$category = @{
'50' = 'A'
'54' = 'C'
'60' = 'F'
'66' = 'I'
'74' = 'M'
'88' = 'T'
}
$pattern = "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})"
Select-String -Path 'C:\path\to\file.txt' -Pattern $pattern | % {
New-Object -TypeName PSObject -Property @{
Database = $_.Matches.Groups[1].Value
Category = $category[$_.Matches.Groups[2].Value] + $_.Matches.Groups[3].Value
Length = $_.Matches.Groups[4].Value
}
} | Export-Csv 'C:\path\to\output.csv' -NoType
The latter will give you slightly better performance, although not too much (runtime was 2:35 versus 2:50 for a 120-bit MB input file in my test field).
source to share
Here's one way (I'm not very proud of it)
Select-String -Path "C:\Path\To\File.txt" -Pattern "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})" | %{New-Object -TypeName PSObject -Property @{Database=$_.matches.groups[1];CategoryCode=$_.matches.groups[2];CategoryId=$_.matches.groups[3];Length=$_.matches.groups[4]}} | export-csv "C:\Path\To\File.csv"
source to share
I don't know why you limited your question to a cmdlet Select-String
. If you included the operator switch
, then I would answer you: YES! It's possible!
And I would introduce you this simple and short PowerShell code :
$(switch -Regex -File $fileIN{$patt{[pscustomobject]$matches|select * -ExcludeProperty 0}})|epcsv $fileCSV`
where $fileIN
is the input file $fileCSV
is the CSV file you want to create and $patt
is the template you have in the OP:
$patt='DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})'`
The switch statement is very powerful.