Emges regex: any characters spanning multiple lines between a matching pattern

I want to find I - <characters> I -

and replace it with I - <characters>, I -

.

<characters>

can be anything, including Tab, Newline, Whitespace, *, etc.

For example: I - John M. Smith I -

Should be replaced with I - John M. Smith, I -

.

I've tried something like:

M-x Query replace regexp
\(I - \)\([a-z]+\) \(I - \)
\1\2, \3

      

He does not work. Could you help me?

+3


source to share


1 answer


This can be done to work with multiple regex settings.

Input

I - abc I - 
I - defgh I - 
I - John M. Smith I - 
I - 1234567 I - 
I - 12345
67 I - 
I - 12345
6789ABC
DE F G H IJK
LM N O P I - 

      

Command

M-x query-replace-regexp
\(I - \)\(\(.*?
\)*?.*?\)\( I - \)
\1\2,\4

      

Note that the match regex in the example above does indeed look more like this ...

\(I - \)\(\(.*?\n\)*?.*?\)\( I - \)

      

... with a character \n

representing a newline. In the minibuffer, you need to enter \n

as C-q C-j

.

Output

I - abc, I - 
I - defgh, I - 
I - John M. Smith, I - 
I - 1234567, I - 
I - 12345
67, I - 
I - 12345
6789ABC
DE F G H IJK
LM N O P, I - 

      



Description

The original regexp matching the character class [a-z]+

in the middle. However, you also said:

Can be anything including Tab, Newline, Whitespace, *, etc.

To support this, we can change to .*

to match any character. However, this can lead to excessive consumption of input data, so we use ?

for lazy matching. The last tricky bit is the multiline match, since you said there could be newlines. To support this, let's add processing \n

.

Looking only at the middle part, we ...

\(\(.*?\n\)*?.*?\)

      

... and you can read it as "match over any number of characters (lazily), followed by a new line, any number of times (lazy), followed by another number of characters (lazily, so as not to consume I -

part of the lines in trailing ) ...

Links

+1


source







All Articles