Pairwise list reduction the hard way

Have you ever needed to process a list in pairs using Perl? The typical way to do this involves a loop with each() (for a hash) or splice() (for an array). This is my account of trying to find a more functional way to do it.

I'll use a simple example of what I mean. Given a list of field and value pairs, the goal is to construct a query string suitable for an HTTP GET or POST request. Leaving aside that there are CPAN modules that do this for us and ignoring URL encoding, here is one way this might look:

# Example 1
use 5.010; use strict; use warnings;

my @data = (
  action => 'submit',
  type => 'comment',
  message => 'Hello'
);

my @output;
while ( my ($k,$v) = splice(@data,0,2) ) {
    push @output, "$k=$v"
}
say join( "&", @output );

That produces the output: action=submit&type=comment&message=Hello

Looking at the code, note the extra lexical @output to store the intermediate results. Also, note that the splice() is destructive and empties @data. If I want to preserve @data, I have to copy it first:

# Example 2
use 5.010; use strict; use warnings;

my @data = (
  action => 'submit',
  type => 'comment',
  message => 'Hello'
);

my @output;
my @copy = @data;
while ( my ($k,$v) = splice(@copy,0,2) ) {
    push @output, "$k=$v"
}
say join( "&", @output );

If I want to do the same thing with a hash of data, it would look like this:

# Example 3
use 5.010; use strict; use warnings;

my %data = (
  action => 'submit',
  type => 'comment',
  message => 'Hello'
);

my @output;
while ( my ($k,$v) = each %data ) {
    push @output, "$k=$v"
}
say join( "&", @output );

Is there a more functional way to do this that is non-destructive, has no intermediate variables, takes either a hash or array as just a list of input and can be used as part of a map chain? Sure! Here's a way to do it with reduce from List::Util:

# Example 4
use 5.010; use strict; use warnings;
use List::Util 'reduce';

my @data = (
  action => 'submit',
  type => 'comment',
  message => 'Hello'
);

say join "&", map { "$_->[0]=$_->[1]" } map { @$_ } reduce { 
    ! @$a || ref $a->[-1] ? push @$a, $b : push @$a, [pop @$a, $b]; $a
} [], @data;

That doesn't win any clarity awards, but it works. The reduce block is first called with the first two values of the list as $a and $b. Subsequent calls have the output of the last execution of the block in $a and the next value of the list in $b.

By passing in an empty array ref as the first element, I give it a place to accumulate pairs of data. On each call, I either stash the field name at the end of $a or pop off the field name and replace it with a new arrayref with the field name and value. I end the reduce call with $a so that becomes $a for the next iteration.

It builds up like this:

N  $a                                 $b
1  []                                 'action'      
2  ['action']                         'submit'
3  [['action', 'submit']]             'type'    
4  [['action', 'submit'], 'type']     'comment'
...

The output of reduce is a single arrayref of arrayrefs of pairs of data. The map { @$_ } call dereferences that back into a list of pairs (now wrapped in arrayrefs) that are turned into "field=value" strings and passed to the join.

Unfortunately, this approach creates an array ref for every pair and then throws them away. That's not a great idea, so I thought I'd see if I could create the "field=value" pairs directly as strings during the reduce. By adding another layer to the container that holds the output, I can do this:

# Example 5
use 5.010; use strict; use warnings;
use List::Util 'reduce';

my @data = (
  action => 'submit',
  type => 'comment',
  message => 'Hello'
);

say join "&", map { @{$_->[0]} } reduce {
    ref $a->[-1] ? push @$a, $b : push @{$a->[0]}, pop(@$a) . "=$b"; $a
} [[]], @data;

That is almost exactly the same as the last one, except that the container starts with an extra arrayref to store the list of strings being generated and the dereferencing map is a little uglier.

How do these different examples benchmark? Ignoring example 1, which is destructive, here are the results of a quick benchmarking:

             Rate example5 example4 example2 example3
example5 404781/s       --     -30%     -57%     -59%
example4 580886/s      44%       --     -38%     -41%
example2 936228/s     131%      61%       --      -5%
example3 985939/s     144%      70%       5%       --

The two while loop approaches are much faster! This shouldn't be surprising given all the assignments and array manipulations happening in the reduce approaches. And the while loops are much easier to read. It turns out the typical approach is still the best.

What I'd really like to see someday is a kvmap ("key-value map") function that would iterate through pairs from a list, pass them as $a and $b to a block and return a list of the results. Then I could write this:

# Example 6
use 5.010; use strict; use warnings;
use Doesnt::Exist::Yet 'kvmap';

my @data = (
  action => 'submit',
  type => 'comment',
  message => 'Hello'
);

say join "&", kvmap { "$a=$b" } @data;

That's much shorter and simpler than a while loop and, if implemented in XS, might be comparably fast (or faster).

Does any enterprising soul want to try adding that to List::MoreUtils? I think it would find a lot of use.

(c.f. List::Pairwise, but it's not implemented in XS)

This entry was posted in perl programming and tagged , . Bookmark the permalink. Both comments and trackbacks are currently closed.

18 Comments

  1. dextius
    Posted January 25, 2011 at 2:35 pm | Permalink

    I run into this kind of thing all the time. My issue dealt with a list, that I wanted to turn into pairs. This is what I came up with.

    use strict;
    use Data::Dumper;

    my @y = 1..10;
    my $c = 0;
    my @res;

    while ( $c < scalar(@y) ) {
    my ( $f, $g ) = @y[$c++,$c++];
    push(@res, [ $f, $g ]);
    }
    print Dumper(\@res);

  2. Roy Fulbright
    Posted January 25, 2011 at 4:18 pm | Permalink

    The "natatime" function will do this:

    use strict;
    use warnings;
    use List::MoreUtils qw( natatime );

    my @data = (
    action => 'submit',
    type => 'comment',
    message => 'Hello'
    );

    my @output;
    my $it = natatime 2, @data;
    while (my ($k,$v) = $it->()) {
    push @output, "$k=$v"
    }
    print join( "&", @output ), "\n";

    • Posted January 25, 2011 at 4:59 pm | Permalink

      It's non-destructive and avoids a copy, sure. I don't really see it as much of an improvement overall.

  3. Jade NB
    Posted January 25, 2011 at 10:33 pm | Permalink

    It seems to me that what you want is Haskell's zipWith:

    ub zipWith (&@@) {
    my ( $f, $aref, $bref ) = @_;
    my @a = @$aref; my @b = @$bref;
    say "[@a][@b]";
    return map {
    say "($_)";
    local ( $a, $b ) = ( $a[$_], $b[$_] );
    $f->($a[$_], $b[$_])
    } 0 .. $#a;
    }

    sub kvmap (&\@) {
    say "[@_]";
    my @kv = @{ $_[1] };
    say "{@kv}{$#kv}";
    zipWith \, [ @kv[ map { 2*$_ } 0 .. $#kv/2 ] ], [ @kv[ map { 2*$_ + 1 } 0 .. $#kv/2 ] ];
    }

    say "Hello";
    my @data = ( action => 'submit', type => 'comment', message => 'Hello' );
    say join "&", kvmap { "$a=$b" } @data;

    (I write out the code not because I think you can't do it, but to demonstrate that kvmap really is a simple special case.)

    I wonder if ikegami's even/odd mini-tutorial has anything useful?

  4. Jade NB
    Posted January 25, 2011 at 10:36 pm | Permalink

    Sorry, mangled that. Aside from the obvious ubsub, the prototype for zipWith should be &\@\@, and the kvmap invocation should use @{ [ @[] ] } (to satisfy Perl's persnickety ‘type-checking’) in place of just [ @[] ].

  5. Posted January 25, 2011 at 10:37 pm | Permalink

    my $str = join '&', map { "$data[$_*2]=$data[$_*2+1]" } 0..@data/2-1;

    Comparable to example2 and example3 if you use an intermediate @output, faster than both if you don't, at least on my laptop.

    • Posted January 25, 2011 at 10:53 pm | Permalink

      I like that, though it only works for an array -- you can't pipe the output of another map into it, nor can you use it with a hash.

      • Posted January 25, 2011 at 11:39 pm | Permalink

        You can by changing to work on an arrayref, though that takes another map and slows it down to about halfway between example2 and example3 for me. Plus, it's uglier, and the input no longer matches the output (list -> list).

        join '&', map {
        my $x = $_;
        map { "$x->[$_*2]=$x->[$_*2+1]" } 0..@$x/2-1;
        } \@data;

        • JadeNB
          Posted January 26, 2011 at 12:16 am | Permalink

          Sorry for being foolish, but which of those worries does it address? It still requires knowing the array length at 'write time', so it can't work on an on-the-fly list such as is returned from map. It seems that your goal here is instead to work with an AoA.

          Of course, the modification for hashes is trivial:

          say join '&', map { "$_=$data{$_}" } keys %data;

          • Posted January 26, 2011 at 7:32 am | Permalink

            \@data could instead be [ map .... ] or [ %hash ].

  6. Jade NB
    Posted January 25, 2011 at 10:44 pm | Permalink

    Ha, one more try:

    The dereferenced coderef in the invocation of zipWith got eaten;

    the whole local ( $a, $b ) = ( $a[$i], $b[$i] ); $f->($a[$i], $b[$i]) thing could have been compressed to $f->( local ( $a, $b ) = ( $a[$i], $b[$i] ) ) (I like to avoid having to remember whether the codeblock takes its values “from the air” or as arguments by allowing it to do either);

    and I should have stripped my debugging says before cut-and-pasting. Sorry!

  7. JadeNB
    Posted January 26, 2011 at 2:13 am | Permalink

    This one has the minor advantage over hdp's that it works on lists, not just arrays; and it's not too bad speed-wise:

    sub kvmap (&@) {
    my ( $f, @kv ) = @_;
    my $key = 0;
    local ( $a, $b );
    return map {
    ( $key ^= 1 ) ? do { $a = $_; () } : do { $b = $_; $f->($a, $b) }
    } @kv;
    }

    I'm none too good with benchmarking (and it's 1 AM), so I could have made a mistake, but this is what I get:

    • JadeNB
      Posted January 26, 2011 at 2:59 am | Permalink


      Rate ex5 ex4 ex2 ex1 ex3 my
      ex5 108696/s -- -21% -68% -81% -83% -87%
      ex4 137174/s 26% -- -59% -76% -79% -83%
      ex2 335570/s 209% 145% -- -42% -48% -59%
      ex1 581395/s 435% 324% 73% -- -10% -29%
      ex3 645161/s 494% 370% 92% 11% -- -21%
      my 819672/s 654% 498% 144% 41% 27% --

      where the exact code tested for
      my is

      my $key = 0;
      local ( $a, $b );
      join '&', map {
      ( $key ^= 1 ) ? do { $a = $_; () } : do { "$a=$_" }
      } @data;

      and the others are cut-and-pasted from your post.

      • Posted January 26, 2011 at 8:23 am | Permalink

        That's strange, I get a much slower result for your code. It's "kvmap2", here.

        Rate kvmap2 example2 map_over_index2 example3
        kvmap2 101523/s -- -35% -40% -42%
        example2 155039/s 53% -- -9% -12%
        map_over_index2 169492/s 67% 9% -- -3%
        example3 175439/s 73% 13% 4% --

        • Jade NB
          Posted January 26, 2011 at 9:49 am | Permalink

          Did you run the 'inline' version of my second post, rather than the 'function-call' version of the first (called as kvmap { "$a=$b" } @data)? The function-call version also comes out much slower for me.

          I tried it again on a different computer, and got substantially similar results, except that ex3 slowed way down:

          Rate ex3 ex5 ex4 ex2 ex1 my
          ex3 211178/s -- -56% -66% -81% -85% -89%
          ex5 480964/s 128% -- -23% -57% -65% -76%
          ex4 624381/s 196% 30% -- -44% -54% -69%
          ex2 1116140/s 429% 132% 79% -- -18% -44%
          ex1 1363161/s 546% 183% 118% 22% -- -32%
          my 1999905/s 847% 316% 220% 79% 47% --

          If you're interested in this, maybe you could take a moment and see if my benchmarking code is just ridiculous? I've posted it on my Perlmonks scratchpad.

          • Posted January 28, 2011 at 11:21 am | Permalink

            Yeah, it was inlining vs. function calling. From your original post I thought that's what you meant.

            The time savings of inlining almost makes the specific algorithmic differences irrelevant.

    • Posted January 26, 2011 at 12:26 pm | Permalink

      I like the $key ^= 1 toggle. That's a trick I hadn't seen before. I'll file that away for the future.

      • Jade NB
        Posted January 26, 2011 at 12:38 pm | Permalink

        I like the $key ^= 1 toggle. That's a trick I hadn't seen before. I'll file that away for the future.

        I stole that one from ikegami's tutorial that I linked earlier.

One Trackback

© 2009-2014 David Golden All Rights Reserved