Regex - word boundary "zero-width" does the correct match to the interleaving pattern

Referring to: perl string binding and replacement in one line?

When typing:

home/////test/tmp/

      

And the desired conversion to:

/home/test/tmp/

      

(and other patterns with file paths that require trailing and leading slashes, but not double. For example, /home/test/tmp/

passes but /home/test/tmp

receives trailing slashes, etc.)

Using a triple regular expression;

s,^/*,/,;  #prefix
s,/*$,/,; #suffix
s,/+,/,g; #double slashes anywhere else. 

      

Gives us the correct result:

#!/usr/bin/env perl

use strict;
use warnings;

my $str = 'home/////teledyne/tmp/';
$str =~ s,^/*,/,;    #prefix
$str =~ s,/*$,/,;    #suffix
$str =~ s,/+,/,g;    #double slashes anywhere else.
print $str; 

      

But if I try to combine these patterns using interleaving, I get:

s,(^/*|/+|/*$),/,g 

      

This seems like it should work ... it actually doesn't, and I end up with a double trailing slash.

But by adding a zero-width match, it works great:

s,(^/*|/+|\b/*$),/,g;

      

Can anyone help me understand what's going on in the interleave group in different ways and is there a possible possibility, just leaving there \b

?

+3


source to share


2 answers


The reason is that the alternation /+

in /g

matches the last slash - and then the search continues due to the presence of the anchor. It continues from the position after the last substitution, thus after the last slash. This search matches zero slashes in $

and adds /

.

We can see it on

perl -wE'
    $_ = "home/dir///end/"; 
    while (m{( ^/* | /+ | /*$ )}gx) { say "Got |$1| at ", pos }
'

      

which prints (with alignment at ...

for readability)

Got || at 0
Got | / | at 5
Got | /// | at 11
Got | / | at 15
Got || at 15

With valid substitution

s{( ^/* | /+ | /*$ )}{ say "Got |$1| at ", pos; q(/) }egx

      



the numbers are different as they refer to positions in intermediate lines where the last two

...
Got | / | at 14
Got || at 15

...

I do not see what could go wrong with \b

, as in the question or how /*\b$

.


This is an interesting question, but I would like to add that all these details are avoided

$_ = '/' . (join '/', grep { /./ } split '/', $_) . '/'  for @paths;

      

+2


source


Below is one regex:

s='home/////test/tmp/'
perl -pe 's~^(?!/)|(?<!/)$|/{2,}~/~g' <<< "$s"
/home/test/tmp/

s='home/test/tmp'
perl -pe 's~^(?!/)|(?<!/)$|/{2,}~/~g' <<< "$s"
/home/test/tmp/

      



Regular Expression Distribution:

^(?!/) # Line start if not followed by /
|
(?<!/)$ # Line end if not preceded by /
|
/{2,} # 2 or more /

      

0


source







All Articles