Hacking the Perl core for faster keys, values and each

Last week, I wrote about hacking push and pop to take array references as well as literal arrays. This week, I've added similar functionality for keys, values and each.

Given a hash-of-hashref structure called $data, instead of this (Perl as you know it):

for my $k ( keys %{ $data->{key1} } ) { ... };

You can now do this (new, enhanced Perl):

for my $k ( keys $data->{key1} ) { ... };

Maybe you're thinking "big deal". But how often do you write keys %{ ... }? How would you like that 70% faster when written as keys ... by itself?

That's right. Not only does the new version have less line noise, it's faster than doing explicit dereferencing. From a simple benchmarking program:

use strict; use warnings; use autodie;
use Benchmark qw( cmpthese :hireswallclock );

my $hash = { 1 .. 1000 };
my %hash = ( 1 .. 1000 );

my $count = -10;
cmpthese( $count, {
    'keys %$hash' => sub { my @array = keys %$hash },
    'keys $hash'  => sub { my @array = keys $hash  },
    'keys %hash'  => sub { my @array = keys %hash  },
  });

I get these results with my new patches [1]:

# bleadperl + smart keys, values, each patches
              Rate keys %$hash  keys %hash  keys $hash
keys %$hash 4835/s          --        -41%        -42%
keys %hash  8219/s         70%          --         -1%
keys $hash  8317/s         72%          1%          --

Of course, it's not just keys, but values and each as well, and for both hashes and arrays.

while ( my($k,$v) = each $hashref ) { ... }
while ( my($i,$v) = each $arrayref ) { ... }

Have you ever wanted a flatten function for arrays? (Without resorting to autobox::Core, that is.) Here it is:

for my $v ( values $obj->stuff_as_arrayref ) { ... }

It even works on objects that overload dereferencing:

for my $k ( keys $overloaded_object ) { ... }

Everything is heavily tested, but I'm working on writing up the new documentation. I'd also like to explore some possible edge cases, and then I'll be ready to package it up and submit it to the perl5 porters mailing list for discussion.

If you think this is cool stuff, please let me know. (If you hate it, I'm happy to hear constructive criticism, too.)


[1] For comparison, I ran the same benchmark (minus the "keys $hash" case) on bleadperl without the patches. The timing of the base cases were similar. (Repeated runs of bleadperl and my patched perl gave similar results plus or minus a few percent.)

# bleadperl
              Rate keys %$hash  keys %hash
keys %$hash 4850/s          --        -41%
keys %hash  8240/s         70%          --
This entry was posted in perl programming. Bookmark the permalink. Both comments and trackbacks are currently closed.

16 Comments

  1. Posted September 13, 2010 at 7:03 am | Permalink

    Awesome...

    I really hope this gets into the next Perl release, 5.14.

  2. Paul Evans
    Posted September 13, 2010 at 7:23 am | Permalink

    You know, there is a lot of code out there in the world (i.e. CPAN) already using keys %{ EXPR }. Perhaps is there any way to get the parser to compile that into your newer forms instead, and thus making all this existing code benefit from the speed enhancement?

    If you could present it to p5p as "make keys %{ EXPR } 70% faster", with a side-note that "oh by the way it also allows keys EXPR directly", then that might win you a lot more of the nay-sayers.

    • Posted September 13, 2010 at 7:45 am | Permalink

      That's a great point! I think it should be possible to do that. I am also wondering whether a similar optimization is possible for push/pop with array references. I'll have to give it a try.

      • Brad Gilbert
        Posted September 13, 2010 at 11:41 pm | Permalink

        while(readline $fh){...} gets translated to while( defined( $_ = readline $fh ) ){...} so I don't see why keys(%$hash) couldn't be translated to keys($hash)

        The former is found in op.c. Actually I used that same bit of code to have while(readdir $dh){...} translated to while( defined( $_ = readdir $dh ) ){...}

        I think making old code run significantly faster, would quell the dissent of even the most stubborn of naysayers.

        • Posted September 14, 2010 at 7:19 am | Permalink

          That's almost exactly the inverse of what I did for push/pop, which takes push $array, @stuff and re-writes the ops as push @{$array}, @stuff, so I have a pretty good idea of how to make it work they way you describe. One conundrum in all of this is that the function signatures can't be expressed using today's prototypes, so I'm exploring extensions to the prototypes to support this kind of dynamic behavior.

  3. Roderich Schupp
    Posted September 13, 2010 at 7:43 am | Permalink

    Not to spoil you work, I really appreciate it, but...

    I just ran your test (sans 'keys $hash' of course) on Perl 5.8.8 and 5.10.1 and there's only 1% difference there - that's what I expected. So the difference for bleadperl looks like it has an optimization for the 'keys %hash' case that for some reason doesn't apply for 'keys %$hash', too. Then this ought to be investigated.

    • Posted September 13, 2010 at 7:47 am | Permalink

      I'll look into it and compare 5.8, 5.10 and 5.12. It's possible that keys %$hash lost some optimization when support was added for keys @array.

    • Max
      Posted September 14, 2010 at 3:33 am | Permalink

      Yes, it seems there is a performance regression in perl 5.12.1 :

      On 5.10.1:

      Rate keys %hash keys %$hash
      keys %hash 7111/s -- -0%
      keys %$hash 7116/s 0% --

      On 5.12.1:

      Rate keys %$hash keys %hash
      keys %$hash 5742/s -- -27%
      keys %hash 7890/s 37% --

      The two versions run on FreeBSD 64 bits hosts which have not exactly the same hardware, but it reveals that with perl 5.12.1 there is a performance problem when dereferencing $hash...

      • Posted September 14, 2010 at 7:30 am | Permalink

        I did the analysis on the same machine on four generations of Perl and made a graph of the results. I can confirm what you saw. So my "optimization" is just getting us back to where we were before, but at least that's something.

        • Roderich Schupp
          Posted September 16, 2010 at 10:10 am | Permalink

          I'm confused by your graph: I would have expected two values
          (keys %hash vs keys %$hash) per perl version. What does
          (1)-(4) stand for?

          • Posted September 16, 2010 at 1:37 pm | Permalink

            (1) to (4) were just four different runs of the same thing to see the variability between benchmark runs.

  4. Posted September 13, 2010 at 11:02 am | Permalink

    Excellent work! I really hope this makes it into 5.14 and if not that 5.16

  5. Mark
    Posted September 13, 2010 at 11:16 am | Permalink

    This is right on! Looking forward to this in a future Perl release.

  6. arun prasaad
    Posted September 13, 2010 at 11:35 am | Permalink

    Very cool indeed. Less is more. Keep the good stuff coming!

  7. Nilson Santos F. Jr.
    Posted September 13, 2010 at 5:43 pm | Permalink

    Very nice. I hope it's available in 5.14. Perl 5 needs to move forward. :)

  8. Christopher Bottoms
    Posted September 14, 2010 at 5:30 pm | Permalink

    Great work. This inches us toward Perl 6, where %hash and $hashref are essentially equivalent.

© 2009-2014 David Golden All Rights Reserved