Last week, I wrote about hacking push and pop to take array references as well as literal arrays. This week, I've added similar functionality for keys, values and each.
Given a hash-of-hashref structure called $data, instead of this (Perl as you know it):
for my $k ( keys %{ $data->{key1} } ) { ... };
You can now do this (new, enhanced Perl):
for my $k ( keys $data->{key1} ) { ... };
Maybe you're thinking "big deal". But how often do you write keys %{ ... }? How would you like that 70% faster when written as keys ... by itself?
That's right. Not only does the new version have less line noise, it's faster than doing explicit dereferencing. From a simple benchmarking program:
use strict; use warnings; use autodie;
use Benchmark qw( cmpthese :hireswallclock );
my $hash = { 1 .. 1000 };
my %hash = ( 1 .. 1000 );
my $count = -10;
cmpthese( $count, {
'keys %$hash' => sub { my @array = keys %$hash },
'keys $hash' => sub { my @array = keys $hash },
'keys %hash' => sub { my @array = keys %hash },
});
I get these results with my new patches [1]:
# bleadperl + smart keys, values, each patches
Rate keys %$hash keys %hash keys $hash
keys %$hash 4835/s -- -41% -42%
keys %hash 8219/s 70% -- -1%
keys $hash 8317/s 72% 1% --
Of course, it's not just keys, but values and each as well, and for both hashes and arrays.
while ( my($k,$v) = each $hashref ) { ... }
while ( my($i,$v) = each $arrayref ) { ... }
Have you ever wanted a flatten function for arrays? (Without resorting to autobox::Core, that is.) Here it is:
for my $v ( values $obj->stuff_as_arrayref ) { ... }
It even works on objects that overload dereferencing:
for my $k ( keys $overloaded_object ) { ... }
Everything is heavily tested, but I'm working on writing up the new documentation. I'd also like to explore some possible edge cases, and then I'll be ready to package it up and submit it to the perl5 porters mailing list for discussion.
If you think this is cool stuff, please let me know. (If you hate it, I'm happy to hear constructive criticism, too.)
[1] For comparison, I ran the same benchmark (minus the "keys $hash" case) on bleadperl without the patches. The timing of the base cases were similar. (Repeated runs of bleadperl and my patched perl gave similar results plus or minus a few percent.)
# bleadperl
Rate keys %$hash keys %hash
keys %$hash 4850/s -- -41%
keys %hash 8240/s 70% --
16 Comments
Awesome...
I really hope this gets into the next Perl release, 5.14.
You know, there is a lot of code out there in the world (i.e. CPAN) already using keys %{ EXPR }. Perhaps is there any way to get the parser to compile that into your newer forms instead, and thus making all this existing code benefit from the speed enhancement?
If you could present it to p5p as "make keys %{ EXPR } 70% faster", with a side-note that "oh by the way it also allows keys EXPR directly", then that might win you a lot more of the nay-sayers.
That's a great point! I think it should be possible to do that. I am also wondering whether a similar optimization is possible for push/pop with array references. I'll have to give it a try.
while(readline $fh){...}gets translated towhile( defined( $_ = readline $fh ) ){...}so I don't see whykeys(%$hash)couldn't be translated tokeys($hash)The former is found in
op.c. Actually I used that same bit of code to havewhile(readdir $dh){...}translated towhile( defined( $_ = readdir $dh ) ){...}I think making old code run significantly faster, would quell the dissent of even the most stubborn of naysayers.
That's almost exactly the inverse of what I did for push/pop, which takes
push $array, @stuffand re-writes the ops aspush @{$array}, @stuff, so I have a pretty good idea of how to make it work they way you describe. One conundrum in all of this is that the function signatures can't be expressed using today's prototypes, so I'm exploring extensions to the prototypes to support this kind of dynamic behavior.Not to spoil you work, I really appreciate it, but...
I just ran your test (sans 'keys $hash' of course) on Perl 5.8.8 and 5.10.1 and there's only 1% difference there - that's what I expected. So the difference for bleadperl looks like it has an optimization for the 'keys %hash' case that for some reason doesn't apply for 'keys %$hash', too. Then this ought to be investigated.
I'll look into it and compare 5.8, 5.10 and 5.12. It's possible that keys %$hash lost some optimization when support was added for keys @array.
Yes, it seems there is a performance regression in perl 5.12.1 :
On 5.10.1:
Rate keys %hash keys %$hash
keys %hash 7111/s -- -0%
keys %$hash 7116/s 0% --
On 5.12.1:
Rate keys %$hash keys %hash
keys %$hash 5742/s -- -27%
keys %hash 7890/s 37% --
The two versions run on FreeBSD 64 bits hosts which have not exactly the same hardware, but it reveals that with perl 5.12.1 there is a performance problem when dereferencing $hash...
I did the analysis on the same machine on four generations of Perl and made a graph of the results. I can confirm what you saw. So my "optimization" is just getting us back to where we were before, but at least that's something.
I'm confused by your graph: I would have expected two values
(keys %hash vs keys %$hash) per perl version. What does
(1)-(4) stand for?
(1) to (4) were just four different runs of the same thing to see the variability between benchmark runs.
Excellent work! I really hope this makes it into 5.14 and if not that 5.16
This is right on! Looking forward to this in a future Perl release.
Very cool indeed. Less is more. Keep the good stuff coming!
Very nice. I hope it's available in 5.14. Perl 5 needs to move forward. :)
Great work. This inches us toward Perl 6, where %hash and $hashref are essentially equivalent.