Reading over multiple lines with regex?
I have a file that looks like this:
SPECIMEN: Procedure: xxxx1 A) Location: yyyy2
Major zzz B) Location: something
text here C) more
CLINICAL DIAGNOSIS: xyz
If the lines are newlines CR
, then LF
.
I'm trying to make a regex that reads from end Procedure:
to beginning CLINICAL DIAGNOSIS
but has trouble reading multiple lines.
Here's what I have:
$input_file = 'c:\Path\0240188.txt'
$regex = ‘(?m)^SPECIMEN: Procedure: (.*)CLINICAL DIAGNOSIS:’
select-string -Path $input_file -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value }
Which returns nothing.
If I change the line to:
$regex = ‘(?m)^SPECIMEN: Procedure: (.*)’
It grabs the first line, but not the rest. I assumed I (?m)
intended to grab a few lines for me.
Any advice?
source to share
Try the following:
$regex = '(?ms).*SPECIMEN: Procedure:(.+)CLINICAL DIAGNOSIS: '
Get-Content $input_file -Delimiter 'CLINICAL DIAGNOSIS: '|
foreach {@($_) -match 'CLINICAL DIAGNOSIS: ' -replace $regex,'$1'}
Using "Clinical Diagnostics" as a separator eliminates the need to read all data at the same time and simultaneously resolve / capture multiple matches.
source to share
Try the following:
$input_file = gc 'c:\Path\0240188.txt' | out-string
# or: gc c:\path\xxxxx.txt -raw #with v3+
$regex = ‘(?s)\bSPECIMEN: Procedure: (.*?)CLINICAL DIAGNOSIS:’
$input_file | select-string -Pattern $regex -AllMatches | % { $_.Matches }
# or: [regex]::matches($input_file, $regex) # much faster
source to share
You can use a little regex trick like:
Procedure:([\S\s]+)CLINICAL DIAGNOSIS
Since it .
matches everything except new lines, you can use [\S\s]+
to match everything as the image is displayed in green and also captures it with a capture group (...)
. This trick works if you want to avoid using the one-liner.
source to share