Can I replace Get-Content, ForEach-Object string -match with Select-String cmdlet?

I have a fixed width file with entries in the format as follows

DDEDM2018890                                                                 19960730015000010000
DDETPL015000                                                                 20150515015005010000
DDETPL015010                                                                 20150515015003010000
DDETPL015020                                                                 20150515015002010000
DDETPL015030                                                                 20150515015005010000
DDETPL015040                                                                 20150515015000010000

      

the first 3 characters identify the record type, in the above example all records are of type DDE

, but the file also contains lines of a different type.

the following regex with named capture groups parses the relevant information from each record for my purpose (note that it filters down to DDE

record types as well :

DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})

      

play with this regex for this great online parser

I wrote the script, which uses the cmdlets Get-Content

, ForEach-Object

and Select-Object

to convert a fixed-width file to a csv file.

I wonder if I can replace the cmdlets Get-Content

with a ForEach-Object

single cmdlet Select-String

?

#this powershell script reads fixed width file and generates a csv file of the relevant & converted values

#Prepare HashSet object for Select-Object to convert CategoryCode and append with CategoryId
$Category = @{
    Name = "Category"
    Expression = {
        $cat = switch($_.CategoryCode) 
        {
            "50"{"A"}
            "54"{"C"}
            "60"{"F"}
            "66"{"I"}
            "74"{"M"}
            "88"{"T"}
        } 
        $cat+$_.CategoryId
    }
}

gc "C:\Path\To\File.txt" | % { 
        if($_ -match "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3}).*$")
        {
            #$matches is a hashset of named capture groups, convert to object to allow Select-Object to handle hashset elements as object properties
            [PSCustomObject]$matches
        }
    } | select Database, $Category, Length #| export-csv "AnalysisLengths.csv" -NoTypeInformation

      

Before I finished the script, I tried to use the cmdlet Select-String

but could not figure out how to use it, I believe it can achieve the same result in a more eloquent way ... this is what I had:

##Could this be completed with just the Select-String commandlet instead of Get-Content+ForEach+Select-Object?
Select-String -Path "C:\Path\To\File.txt" `
    -Pattern "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})" `
    | Select-Object -ExpandProperty Matches 

      

The use is -ExpandProperty

to convert the property Microsoft.PowerShell.Commands.MatchInfo

Matches

to actual objects System.Text.RegularExpressions.Match

for each row ...

see also Powershell Select-Object vs ForEach in Select-String results

+3


source to share


3 answers


As long as Select-String

can combine Get-Content

and match patterns, you still need a loop to create your custom objects. You can stick with what you have, although I would suggest a couple of modifications. Replace the operator with a switch

hash table and create a nested if

a Where-Object

filter:

$categories = @{
  '50' = 'A'
  '54' = 'C'
  '60' = 'F'
  '66' = 'I'
  '74' = 'M'
  '88' = 'T'
}

$category = @{
  Name       = 'Category'
  Expression = { $categories[$_.CategoryCode] + $_.CategoryId }
}

$pattern = 'DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})'

Get-Content 'C:\path\to\file.txt' |
  ? { $_ -match $pattern } |
  % { [PSCustomObject]$matches } |
  select Database, $category, Length |
  Export-Csv 'C:\path\to\output.csv' -NoType

      

Or you can go with @JPBlanc's suggestion (again with a few minor changes):



$category = @{
  '50' = 'A'
  '54' = 'C'
  '60' = 'F'
  '66' = 'I'
  '74' = 'M'
  '88' = 'T'
}

$pattern = "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})"

Select-String -Path 'C:\path\to\file.txt' -Pattern $pattern | % {
  New-Object -TypeName PSObject -Property @{
    Database = $_.Matches.Groups[1].Value
    Category = $category[$_.Matches.Groups[2].Value] + $_.Matches.Groups[3].Value
    Length   = $_.Matches.Groups[4].Value
  }
} | Export-Csv 'C:\path\to\output.csv' -NoType

      

The latter will give you slightly better performance, although not too much (runtime was 2:35 versus 2:50 for a 120-bit MB input file in my test field).

+1


source


Here's one way (I'm not very proud of it)



Select-String -Path "C:\Path\To\File.txt" -Pattern "DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})" | %{New-Object -TypeName PSObject -Property @{Database=$_.matches.groups[1];CategoryCode=$_.matches.groups[2];CategoryId=$_.matches.groups[3];Length=$_.matches.groups[4]}} | export-csv "C:\Path\To\File.csv"

      

+2


source


I don't know why you limited your question to a cmdlet Select-String

. If you included the operator switch

, then I would answer you: YES! It's possible!

And I would introduce you this simple and short PowerShell code :

$(switch -Regex -File $fileIN{$patt{[pscustomobject]$matches|select * -ExcludeProperty 0}})|epcsv $fileCSV` 

      

where $fileIN

is the input file $fileCSV

is the CSV file you want to create and $patt

is the template you have in the OP:

$patt='DDE(?<Database>\w{3})\d{2}(?<CategoryCode>\d{2})(?<CategoryId>\d{1})\d\s+\d{8}\d{3}(?<Length>\d{3})'`

      

The switch statement is very powerful.

+2


source







All Articles