Need help understanding Perl code - Multi process / fork
I was looking for an example to limit the number of forked processes to run at the same time and I came across this old code
#!/usr/bin/perl
#total forks, max childs, what to run
#function takes 2 scalars and a reference to code to run
sub mfork ($$&) {
my ($count, $max, $code) = @_;
# total number of processes to spawn
foreach my $c (1 .. $count) {
#what is happening here? why wait vs waitpid?
wait unless $c <= $max;
die "Fork failed: $!\n" unless defined (my $pid = fork);
# i don't undestand the arrow notation here and how it calls a function,
#also unless $pid is saying run function unless you're the parent
exit $code -> ($c) unless $pid;
}
#no idea what happening here, why are we waiting twice? for the last process?
#why 1 until (-1 == wait)? what 1 doing here
1 until -1 == wait;
}
#code to run
mfork 10, 3, sub {
print "$$: " . localtime() . ": Starting\n";
select undef, undef, undef, rand 2;
print "$$: " . localtime() . ": Exiting\n";
};
+3
source to share
1 answer
Let's look at the code. The code is yours, with most of your comment removed. All other comments are mine.
#!/usr/bin/perl
# total forks, max childs, what to run
# function takes 2 scalars and a reference to code to run
sub mfork ($$&) {
my ($count, $max, $code) = @_;
# total number of processes to spawn
foreach my $c (1 .. $count) {
# wait waits for any child to return,
# waitpid for a specific one
wait unless $c <= $max;
die "Fork failed: $!\n" unless defined (my $pid = fork);
# the arrow is used to call the coderef in $code
# and the argument is $c. It confusing because it has
# the space. It a deref arrow, but looks like OOp.
# You're right about the 'unless $pid' part.
# If there is $pid it in the parent, so it does
# nothing. If it is the child, it will run the
# code and exit.
exit $code -> ($c) unless $pid;
}
# This is reached after the parent is done with the foreach.
# It will wait in the first line of the foreach while there are
# still $count tasks remaining. Once it has spawned all of those
# (some finish and exit and make room for new ones inside the
# loop) it gets here, where it waits for the remaining ones.
# wait will return -1 when there are no more children.
# The '1 until' is just short for having an until loop that
# doesn't have the block. The 1; is not a costly operation.
# When wait == -1 it passes the line, returning from the sub.
1 until -1 == wait;
}
# because of the prototype above there are no () needed here
mfork 10, 3, sub {
print "$$: " . localtime() . ": Starting\n";
select undef, undef, undef, rand 2;
print "$$: " . localtime() . ": Exiting\n";
};
Let's take a closer look at the material.
-
wait
andwaitpid
,wait
will wait until any of the children return. This is useful because the program doesn't care which slot is freed. As soon as one ends, you can create a new one.waitpid
takes a definite argument$pid
. It's not helpful here. - The syntax
$code->($c)
runs coderef. Just like%{ $foo }{bar}
hashref will dereference,&{ $baz }()
will dereference (and run that()
) coderef. An easier way to read$foo->{bar}
. The same is true for$baz->()
. Arrava shares it. See perlref and perlreftut .
While this is nice and useful, it might be wiser to use Parallel :: Forkmanager , which makes it possible to do this in significantly fewer lines of code, and you don't have to worry about how it works.
use strict;
use warnings;
use Parallel::ForkManager;
my $pm = Parallel::ForkManager->new(3); # max 3 at the same time
DATA_LOOP:
foreach my $data (1 .. 10) {
# Forks and returns the pid for the child:
my $pid = $pm->start and next DATA_LOOP;
... do some work with $data in the child process ...
print "$$: " . localtime() . ": Starting\n";
select undef, undef, undef, rand 2;
print "$$: " . localtime() . ": Exiting\n";
$pm->finish; # Terminates the child process
}
What is it. The path is clearer to read. :)
+4
source to share