How do I group a character string by 4?

I have a string 1234567890

and I want to format it as1234 5678 90

I am writing this regex:

$str =~ s/(.{4})/$1 /g;

      

But 12345678

it doesn't work for this case . I am getting extra spaces at the end:

>>1234 5678 <<

      

I am trying to rewrite regex with lookahead:

s/((?:.{4})?=.)/$1 /g;

      

How can I rewrite the regular expression to fix this case?

+3


source to share


5 answers


Just use unpack

use strict;
use warnings 'all';

for ( qw/ 12345678 1234567890 / ) {
    printf ">>%s<<\n", join ' ', unpack '(A4)*';
}

      



Output

>>1234 5678<<
>>1234 5678 90<<

      

+8


source


Context is your friend:

join(' ', $str =~ /(.{1,4})/g)

      

In the context of a list, the match will contain all four characters (and anything shorter than the end of the line, thanks to greed). join

makes sure chunks are separated by spaces and there are no trailing spaces at the end.



If it $str

is huge and the temporary list is increasing the amount of memory too much, then you may just want to do s///g

and share the final space.

I prefer to use the simplest patterns in regular expressions. Also, I haven't measured, but with long strings, only one chop

could be cheaper than the conditional template in s///g

:

$ echo $'12345678\n123456789' | perl -lnE 's/(.{1,4})/$1 /g; chop; say ">>$_<<"'
>>1234 5678<<
>>1234 5678 9<<

      

+6


source


You had the syntax almost on the right. Instead, ?=.

you need (?=.)

(parens are part of the lookahead syntax). So:

s/((?:.{4})(?=.))/$1 /g

      

But you don't need grouping without capturing:

s/(.{4}(?=.))/$1 /g

      

And I think it's clearer if the capture does not include a view:

s/(.{4})(?=.)/$1 /g

      

And given your example data, a non-main statement is also being executed:

s/(.{4})\B/$1 /g

      

Or using \ K to automatically save the matched part:

s/.{4}\B\K/ /g

      

+4


source


To fix the regex, I have to write:

$str =~ s/(.{4}(?=.))/$1 /g;

      

I just need to add parentheses around ?=.

. Without them, ?=.

it is considered a match without greed, followed by=.

So we match four characters and add a space after them. Then I look forward to see that there are more characters. For example, the regex will not match the string1234

+3


source


Just use look and feel to see that you have at least one symbol:

$ echo $'12345678\n123456789' | perl -lnE 's/.{4}\K(?=.{1})/ /g; say ">>$_<<"'
>>1234 5678<<
>>1234 5678 9<<

      

+1


source







All Articles