Parallel map with Parallel::Iterator

Do you have a multi-core processor? Would you like an easy way for Perl to use all your cores to run an iterative task in parallel? Then you should check out Parallel::Iterator.

Parallel::Iterator provides a very simple API to execute a “map” function in parallel across multiple processes. It handles all the forking and inter-process communication for you, so all you have to do is focus on the task at hand. This is useful for a CPU-intensive task that you’d like to spread across multiple cores or for tasks with relatively long I/O delays.

Here is a very simple example, adapted from the documentation. Suppose you need to fetch a lot of web-pages from URL’s provided on STDIN. You just need to write the “worker” subroutine to fetch a page and Parallel::Iterator does the rest.

use strict;
use warnings;
use LWP::UserAgent;
use Parallel::Iterator qw/iterate_as_array/;

# this worker fetches a page or returns undef
my $ua = LWP::UserAgent->new( env_proxy => 1 );
my $worker = sub {
  my ($index, $url) = @_;
  my $resp = $ua->get($url);
  return undef unless $resp->is_success;
  return $resp->decoded_content;
};

# this gets a list of pages and fetches them in parallel
my @urls = split "\n", do { local $/; <STDIN> };
my @pages = iterate_as_array( $worker, \@urls );

# now do stuff with the results
...

With just that little code, the URL’s to fetch are split across 10 sub-processes, the page fetching happens in parallel and the results are collected into the @pages array.

Parallel::Iterator has a number of additional features for customizing the number of subprocesses, or iterating the results (instead of getting back an array) and for batching the input to subprocesses.

It may not be the right tool all of the time, but if you need to parallelize a task quickly, Parallel::Iterator should be in your tool-box.