Perl splitting on multiple occurrences of the same pattern
I wrote the following Perl script to split into multiple occurrences of the same pattern.
Template: (some text)
Here's what I've tried:
foreach my $line (@input) {
if ($line =~ /(\(.*\))+/g) {
my @splitted = split(/(\(.*\))/, $line);
foreach my $data (@splitted) {
print $data, "\n";
}
}
}
For a given input text:
Non-rapid eye movement sleep (NREMS).
Cytokines such as interleukin-1 (IL-1), tumor necrosis factor, acidic fibroblast growth factor (FGF), and interferon-alpha (IFN-alpha).
I am getting the following output:
Non-rapid eye movement sleep
(NREMS).
Cytokines such as interleukin-1
(IL-1), tumor necrosis factor, acidic fibroblast growth factor (FGF), and interferon-alpha (IFN-alpha).
The code does not break the text into the second and third occurrences of the pattern on line 2 of the text. I cannot figure out what I am doing wrong.
source to share
(\([^(]*\))
Share it. Your regex is greedy. Or make it not greedy. (\(.*?\))
...
See demo.
https://regex101.com/r/dU7oN5/14
The problem with your regex can be seen here https://regex101.com/r/dU7oN5/15
Your regex matches (
and then greedily searches for the last one )
, not the first )
one it encounters. So the last line is captured by it.
source to share
You haven't specified your purpose, but I suggest you use regex instead split
. But it looks like you are processing free-form text that will never work as expected in general.
This program finds all text (and values in square brackets) in the input.
use strict;
use warnings;
while (<DATA>) {
while ( / ( [^()]* ) \( ( [^()]* ) \) /xg ) {
my ($defn, $abbr) = ($1, $2);
print "$defn\n";
print "-- $abbr\n\n";
}
}
__DATA__
Non-rapid eye movement sleep (NREMS).
Cytokines such as interleukin-1 (IL-1), tumor necrosis factor, acidic fibroblast growth factor (FGF), and interferon-alpha (IFN-alpha).
Output
Non-rapid eye movement sleep
-- NREMS
Cytokines such as interleukin-1
-- IL-1
, tumor necrosis factor, acidic fibroblast growth factor
-- FGF
, and interferon-alpha
-- IFN-alpha
source to share