"Multiple pass" RegEx to correct gaps in space

Searched and found some seemingly similar questions that weren't quite.

I often find myself replacing leading 4-space padding with tabs. I always do this with RegEx ^(\t*) {4}

replacing $1\t

. And then I just do a few passes to catch the nested padding. It works, it's easy. But I'm wondering if it's possible to write a RegEx that can do this in a single pass (to handle nested indentation)?

EDIT

Sorry for the lack of I / O examples, I was in a hurry. Here's an example, let s

means space

and t

mean tab

:

SMA
ssssRTP
ssssssssATR
ssssssssOLN
ssssOWH
ssssERE
TOGO

      

Output:

SMA
tRTP
ttATR
ttOLN
tOWH
tERE
TOGO

      

Essentially, RegEx will have to resolve arbitrarily deeply nested chunks of 4 spaces. This is not required for tabs following spaces in the original input.

PCRE

+3


source to share


2 answers


(^\t*|\G) {4}

replace with $1\t

or (^|\G)( {4}|\t)

replace with \t

. You must use multiline mode.

How it works:



^\t*

is the start of the beginning of a line followed by any number of tabs.
\G

- this is the end of the match of the previous match.
โ€‹ {4}

- this matches four spaces.

So this regex matches four spaces at the beginning of a line, or four spaces following four spaces that already match this regex.

+2


source


Tested with the .NET Regular Expression Engine. JavaScript (at least Mozilla) won't work; it relies on lookbehind which is not available. PCRE requires a fixed length lookup, so unfortunately that won't work either.

(?<=^( {4}|\t)*) {4}

      

The basic idea is to match the four spaces that are preceded by the beginning of the line and all the points at which the previous match naturally took place. Since the replacement is done atomically, there is no chance of missing such a place; all such matches are assembled at once. Then make sure you use a flag Multiline

and replace with a single tab character and you're good to go.



The data under test, which is just random pseudocode in vaguely pythonic style:

    def a:
        return true
          # comment     with embedded        spaces etc.

      

+1


source







All Articles