Splitting a string using regular expression in Perl
I need help splitting the following line (Date, ID, msecs)
May 26 09:33:33 localhost archiver: saving ID 0191070818_1462647213_489705 took 180 msec
I only need the first part of the ID before the first underscore.
So this is what I want the result to look like
May 26 09:33:33, 0191070818, 180
I am having a hard time figuring out what to add to the regex
use strict;
use warnings;
my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';
my @values = split('/[]/', $data);
foreach my $val (@values) {
print "$val\n";
}
exit 0;
source to share
OK. This split just won't work - because you used single quotes, the string is used literally. Since this doesn't happen in your example text, it doesn't do anything.
Split "shrinks" the string based on the field separator, which is probably not what you want. For example.
split ( ' ', $data );
You'll get:
$VAR1 = [
'May',
'26',
'09:33:33',
'localhost',
'archiver:',
'saving',
'ID',
'0091070818_1432647213_489715',
'took',
'180',
'msec'
];
Given that your line doesn't actually "debug" so correctly, I would suggest a different approach:
You need to choose what you want from him. Assuming you don't get multiple odd entries mixed in:
my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';
my ($time_str) = ( $data =~ m/^(\w+ \d+ \d{2}:\d{2}:\d{2})/ );
my ($id) = ( $data =~ m/(\d+)_/ );
my ($msec) = ( $data =~ m/(\d+) msec/ );
print "$time_str, $id, $msec,\n";
Note. You can combine regex patterns (as some examples show). I did it this way, hopefully to simplify and clarify what's going on. The regex match is applied to $data
(because of =~
). The parenthesized "match" elements ()
are then retrieved and "returned" for insertion into the variable from the left side.
(Note - you need to have "my ($ msec)" in parentheses as this value is used, not the test result (true / false))
source to share
The easiest way is to just split the data into spaces (and then reconstruct the date by concatenating the first three fields). It's not very difficult, but it gets the job done.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';
my @values = split(/\s+/, $data);
my $date = join ' ', @values[0,1,2];
my $id = $values[7];
my $time = $values[9];
say "Date: $date";
say "ID: $id";
say "Time: $time";
What gives:
Date: May 26 09:33:33
ID: 0091070818_1432647213_489715
Time: 180
source to share
I don't know what split()
is the best approach. This code matches your target id and retrieves it:
($id) = $data =~ m/(?<=ID )[^_]+/g;
The regex uses look-behind (?<=ID )
to anchor the start of the match to the right of "ID "
, and then grabs whatever is not underlined below.
Here are some test codes:
my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';
($id) = $data =~ m/(?<=ID )[^_]+/g;
print $id
Output:
0091070818
Watch live demo .
source to share
It is probably best to do this with three separate templates. The code below demonstrates
I used a modifier /x
so that I can put spaces in regex patterns to improve readability
If you are not sure that your data will be well formed (i.e., it is the result of the program), you should add tests to ensure that all three values ββare determined after matching the pattern. Or you can directly test the matching pattern itself
use strict;
use warnings;
use v5.10;
my $s = 'May 26 09:33:33 localhost archiver: saving ID 0191070818_1462647213_489705 took 180 msec';
for ( $s ) {
my ($date) = / ^ ( [a-z]+ \s+ \d+ \s+ [\d:]+ ) /ix;
my ($id) = / ID \s+ (\d+) _ /x;
my ($msecs) = / (\d+) \s+ msec /x;
say join ',', $date, $id, $msecs;
}
Output
May 26 09:33:33,0191070818,180
source to share
split
is not a tool for use here. Here is a regex that works at least for your specific case that you provided.
my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';
$data =~ m/^(\w+ \d+ \d\d:\d\d:\d\d).+saving ID (\d+).+took (\d+) msec$/;
my ($date, $id, $msec) = ($1,$2,$3);
print "$date, $id, $msec\n";
source to share