Perl splitting on multiple occurrences of the same pattern

I wrote the following Perl script to split into multiple occurrences of the same pattern.

Template: (some text)

Here's what I've tried:

foreach my $line (@input) {

  if ($line =~ /(\(.*\))+/g) {

    my @splitted = split(/(\(.*\))/, $line);

    foreach my $data (@splitted) {
      print $data, "\n";
    }
  }
}

      

For a given input text:

Non-rapid eye movement sleep (NREMS).
Cytokines such as interleukin-1 (IL-1), tumor necrosis factor, acidic fibroblast growth factor (FGF), and interferon-alpha (IFN-alpha).

      

I am getting the following output:

Non-rapid eye movement sleep
(NREMS).
Cytokines such as interleukin-1
(IL-1), tumor necrosis factor, acidic fibroblast growth factor (FGF), and interferon-alpha (IFN-alpha).

      

The code does not break the text into the second and third occurrences of the pattern on line 2 of the text. I cannot figure out what I am doing wrong.

+3


source to share


3 answers


(\([^(]*\))

      

Share it. Your regex is greedy. Or make it not greedy. (\(.*?\))

...

See demo.



https://regex101.com/r/dU7oN5/14

The problem with your regex can be seen here https://regex101.com/r/dU7oN5/15

Your regex matches (

and then greedily searches for the last one )

, not the first )

one it encounters. So the last line is captured by it.

+3


source


You haven't specified your purpose, but I suggest you use regex instead split

. But it looks like you are processing free-form text that will never work as expected in general.

This program finds all text (and values ​​in square brackets) in the input.

use strict;
use warnings;

while (<DATA>) {
  while ( / ( [^()]* ) \( ( [^()]* ) \) /xg ) {
    my ($defn, $abbr) = ($1, $2);
    print "$defn\n";
    print "-- $abbr\n\n";
  }
}

__DATA__
Non-rapid eye movement sleep (NREMS).
Cytokines such as interleukin-1 (IL-1), tumor necrosis factor, acidic fibroblast growth factor (FGF), and interferon-alpha (IFN-alpha).

      



Output

Non-rapid eye movement sleep 
-- NREMS

Cytokines such as interleukin-1 
-- IL-1

, tumor necrosis factor, acidic fibroblast growth factor 
-- FGF

, and interferon-alpha 
-- IFN-alpha

      

+2


source


Try:

foreach my $line (@input) {
    if($line =~/\(.*\)/) { # modifier g can be removed here
        my @splitted = split(/(\(.+?\))/, $line); # make the match non greedy
        foreach my $data (@splitted) { 
            print $data, "\n"; 
        }
    }
}

      

+1


source







All Articles