Splitting a string using regular expression in Perl

Question

Splitting a string using regular expression in Perl

I need help splitting the following line (Date, ID, msecs)

May 26 09:33:33 localhost archiver: saving ID 0191070818_1462647213_489705 took 180 msec

I only need the first part of the ID before the first underscore.

So this is what I want the result to look like

May 26 09:33:33, 0191070818, 180

I am having a hard time figuring out what to add to the regex

use strict;
use warnings;

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';

my @values = split('/[]/', $data);

foreach my $val (@values) {
  print "$val\n";
}

exit 0;

+3

regex perl

user2007843 Jul 20. 15 at 15:10

source to share

6 answers

The easiest way is to just split the data into spaces (and then reconstruct the date by concatenating the first three fields). It's not very difficult, but it gets the job done.

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';

my @values = split(/\s+/, $data);

my $date = join ' ', @values[0,1,2];
my $id   = $values[7];
my $time = $values[9];

say "Date: $date";
say "ID:   $id";
say "Time: $time";

What gives:

Date: May 26 09:33:33
ID:   0091070818_1432647213_489715
Time: 180

+4

Dave cross Jul 20. 15 at 15:28

source to share

split

doesn't look like the right tool for the job. I would use a regex:

my @values = $data =~ /^([[:alpha:]]{3}\s[0-9][0-9]\s[0-9][0-9]:[0-9][0-9]:[0-9][0-9]) # date & time
                       \s.*?\sID\s
                       ([0-9]+)            # ID
                       .*\stook\s
                       ([0-9]+)            # duration
                       \smsec/x;
print join(',', @values), "\n";

+3

choroba Jul 20. 15 at 15:17

source to share

I don't know what split()

is the best approach. This code matches your target id and retrieves it:

($id) = $data =~ m/(?<=ID )[^_]+/g;

The regex uses look-behind (?<=ID )

to anchor the start of the match to the right of "ID "

, and then grabs whatever is not underlined below.

Here are some test codes:

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';
($id) = $data =~ m/(?<=ID )[^_]+/g;
print $id

Output:

0091070818

Watch live demo .

+2

Bohemian Jul 20. 15 at 15:13

source to share

It is probably best to do this with three separate templates. The code below demonstrates

I used a modifier /x

so that I can put spaces in regex patterns to improve readability

If you are not sure that your data will be well formed (i.e., it is the result of the program), you should add tests to ensure that all three values are determined after matching the pattern. Or you can directly test the matching pattern itself

use strict;
use warnings;
use v5.10;

my $s = 'May 26 09:33:33 localhost archiver: saving ID 0191070818_1462647213_489705 took 180 msec';

for ( $s ) {

    my ($date)  = / ^ ( [a-z]+ \s+ \d+ \s+ [\d:]+ ) /ix;
    my ($id)    = / ID \s+ (\d+) _ /x;
    my ($msecs) = / (\d+) \s+ msec /x;

    say join ',', $date, $id, $msecs;
}

Output

May 26 09:33:33,0191070818,180

+2

Borodin Jul 20. 15 at 15:20

source to share

split

is not a tool for use here. Here is a regex that works at least for your specific case that you provided.

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';

$data =~ m/^(\w+ \d+ \d\d:\d\d:\d\d).+saving ID (\d+).+took (\d+) msec$/;

my ($date, $id, $msec) = ($1,$2,$3);

print "$date, $id, $msec\n";

+1

Andy Lester Jul 20. 15 at 15:20

source to share

Sobrique · Accepted Answer · 2015-07-20T15:18:28+0000

OK. This split just won't work - because you used single quotes, the string is used literally. Since this doesn't happen in your example text, it doesn't do anything.

Split "shrinks" the string based on the field separator, which is probably not what you want. For example.

 split ( ' ', $data );

You'll get:

$VAR1 = [
          'May',
          '26',
          '09:33:33',
          'localhost',
          'archiver:',
          'saving',
          'ID',
          '0091070818_1432647213_489715',
          'took',
          '180',
          'msec'
        ];

Given that your line doesn't actually "debug" so correctly, I would suggest a different approach:

You need to choose what you want from him. Assuming you don't get multiple odd entries mixed in:

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';

my ($time_str) = ( $data =~ m/^(\w+ \d+ \d{2}:\d{2}:\d{2})/ );
my ($id)       = ( $data =~ m/(\d+)_/ );
my ($msec)     = ( $data =~ m/(\d+) msec/ );
print "$time_str, $id, $msec,\n";

Note. You can combine regex patterns (as some examples show). I did it this way, hopefully to simplify and clarify what's going on. The regex match is applied to $data

(because of =~

). The parenthesized "match" elements ()

are then retrieved and "returned" for insertion into the variable from the left side.

(Note - you need to have "my ($ msec)" in parentheses as this value is used, not the test result (true / false))

Splitting a string using regular expression in Perl

Output

More articles: