Emges regex: any characters spanning multiple lines between a matching pattern
I want to find I - <characters> I -
and replace it with I - <characters>, I -
.
<characters>
can be anything, including Tab, Newline, Whitespace, *, etc.
For example: I - John M. Smith I -
Should be replaced with I - John M. Smith, I -
.
I've tried something like:
M-x Query replace regexp
\(I - \)\([a-z]+\) \(I - \)
\1\2, \3
He does not work. Could you help me?
source to share
This can be done to work with multiple regex settings.
Input
I - abc I -
I - defgh I -
I - John M. Smith I -
I - 1234567 I -
I - 12345
67 I -
I - 12345
6789ABC
DE F G H IJK
LM N O P I -
Command
M-x query-replace-regexp
\(I - \)\(\(.*?
\)*?.*?\)\( I - \)
\1\2,\4
Note that the match regex in the example above does indeed look more like this ...
\(I - \)\(\(.*?\n\)*?.*?\)\( I - \)
... with a character \n
representing a newline. In the minibuffer, you need to enter \n
as C-q C-j
.
Output
I - abc, I -
I - defgh, I -
I - John M. Smith, I -
I - 1234567, I -
I - 12345
67, I -
I - 12345
6789ABC
DE F G H IJK
LM N O P, I -
Description
The original regexp matching the character class [a-z]+
in the middle. However, you also said:
Can be anything including Tab, Newline, Whitespace, *, etc.
To support this, we can change to .*
to match any character. However, this can lead to excessive consumption of input data, so we use ?
for lazy matching. The last tricky bit is the multiline match, since you said there could be newlines. To support this, let's add processing \n
.
Looking only at the middle part, we ...
\(\(.*?\n\)*?.*?\)
... and you can read it as "match over any number of characters (lazily), followed by a new line, any number of times (lazy), followed by another number of characters (lazily, so as not to consume I -
part of the lines in trailing ) ...
Links
source to share