Parallel map with Parallel::Iterator

Do you have a multi-core processor? Would you like an easy way for Perl to use all your cores to run an iterative task in parallel? Then you should check out Parallel::Iterator.

Parallel::Iterator provides a very simple API to execute a "map" function in parallel across multiple processes. It handles all the forking and inter-process communication for you, so all you have to do is focus on the task at hand. This is useful for a CPU-intensive task that you'd like to spread across multiple cores or for tasks with relatively long I/O delays.

Here is a very simple example, adapted from the documentation. Suppose you need to fetch a lot of web-pages from URL's provided on STDIN. You just need to write the "worker" subroutine to fetch a page and Parallel::Iterator does the rest.

use strict;
use warnings;
use LWP::UserAgent;
use Parallel::Iterator qw/iterate_as_array/;

# this worker fetches a page or returns undef
my $ua = LWP::UserAgent->new( env_proxy => 1 );
my $worker = sub {
  my ($index, $url) = @_;
  my $resp = $ua->get($url);
  return undef unless $resp->is_success;
  return $resp->decoded_content;
};

# this gets a list of pages and fetches them in parallel
my @urls = split "\n", do { local $/; <STDIN> };
my @pages = iterate_as_array( $worker, \@urls );

# now do stuff with the results
...

With just that little code, the URL's to fetch are split across 10 sub-processes, the page fetching happens in parallel and the results are collected into the @pages array.

Parallel::Iterator has a number of additional features for customizing the number of subprocesses, or iterating the results (instead of getting back an array) and for batching the input to subprocesses.

It may not be the right tool all of the time, but if you need to parallelize a task quickly, Parallel::Iterator should be in your tool-box.

This entry was posted in perl programming and tagged , . Bookmark the permalink. Both comments and trackbacks are currently closed.

4 Comments

  1. Paul Evans
    Posted August 5, 2010 at 11:05 am | Permalink

    I've been musing on ideas on how to make a variation on CPS::Functional which supports some concept of parallelism, probably by using a particular Governor object to control it. Something of the order of:

    my $gov = CPS::Governor::Parallel->new( concurrent => 10 );
    $gov->kmap( \@urls,
    sub {
    my ( $url, $k ) = @_;
    GET_async( url => $url, on_response => sub {
    my ( $response ) = @_;
    $k->( $response->is_success ? $response->decoded_content : undef );
    } );
    },
    sub {
    my @pages = @_;
    ...
    }
    );

    • Paul Evans
      Posted August 5, 2010 at 11:07 am | Permalink

      Gah. HTML pre tags disallowed; see also pastie.org

  2. Alexander Hartmaier
    Posted July 13, 2011 at 8:39 am | Permalink

    Did you ever test that code?
    The second parameter to iterate_as_array needs to be an arrayref and the worker signature my ($index, $url) = @_;

    • Posted July 13, 2011 at 9:12 am | Permalink

      I was adapted from an example, not tested. Thanks for catching that. I've updated the post.

2 Trackbacks

© 2009-2014 David Golden All Rights Reserved