Reading over multiple lines with regex?

I have a file that looks like this:

SPECIMEN: Procedure: xxxx1 A) Location: yyyy2
Major zzz B) Location: something
text here C) more


CLINICAL DIAGNOSIS: xyz

      

If the lines are newlines CR

, then LF

.

I'm trying to make a regex that reads from end Procedure:

to beginning CLINICAL DIAGNOSIS

but has trouble reading multiple lines.

Here's what I have:

$input_file = 'c:\Path\0240188.txt'
$regex = ‘(?m)^SPECIMEN: Procedure: (.*)CLINICAL DIAGNOSIS:’
select-string -Path $input_file -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value }

      

Which returns nothing.

If I change the line to:

$regex = ‘(?m)^SPECIMEN: Procedure: (.*)’

      

It grabs the first line, but not the rest. I assumed I (?m)

intended to grab a few lines for me.

Any advice?

+3


source to share


5 answers


Try the following:

$regex = '(?ms).*SPECIMEN: Procedure:(.+)CLINICAL DIAGNOSIS: '

Get-Content $input_file -Delimiter 'CLINICAL DIAGNOSIS: '|
 foreach {@($_) -match 'CLINICAL DIAGNOSIS: ' -replace $regex,'$1'}

      



Using "Clinical Diagnostics" as a separator eliminates the need to read all data at the same time and simultaneously resolve / capture multiple matches.

+1


source


(?m)

forces the bindings ^

and to $

match the start and end of each line when you implement it. You want to use an inline modifier (?s)

that makes point match all characters, including line breaks.



$regex = ‘(?s)SPECIMEN: Procedure: (.*)CLINICAL DIAGNOSIS:
      

+1


source


It seems that $ input_file is only read line by line, which won't help you,

Try:

$fileContent = [io.file]::ReadAllText("C:\file.txt")

      

or

$fileContent = Get-Content c:\file.txt -Raw

      

Taken from another post here .

+1


source


Try the following:

$input_file = gc 'c:\Path\0240188.txt' | out-string
# or: gc c:\path\xxxxx.txt -raw  #with v3+
$regex = ‘(?s)\bSPECIMEN: Procedure: (.*?)CLINICAL DIAGNOSIS:’
$input_file | select-string -Pattern $regex -AllMatches | % { $_.Matches }
# or: [regex]::matches($input_file, $regex) # much faster

      

0


source


You can use a little regex trick like:

Procedure:([\S\s]+)CLINICAL DIAGNOSIS

      

Working demo

enter image description here

Since it .

matches everything except new lines, you can use [\S\s]+

to match everything as the image is displayed in green and also captures it with a capture group (...)

. This trick works if you want to avoid using the one-liner.

0


source







All Articles