Hacking the Perl core for smarter push and pop

In a recent post, I said I wished that Perl's built in functions for array containers would work directly on references.

Rather than this (today):

# Assuming $data->{$key1}{$key2} = [ qw/foo bar/ ]
push @{ $data->{$key1}{$key2} }, @stuff;

I wanted this (in the future):

# Assuming $data->{$key1}{$key2} = [ qw/foo bar/ ]
push $data->{$key1}{$key2}, @stuff;

I've finished a draft implementation that works for push, pop, shift, unshift and splice. All existing Perl tests pass, as do new tests I've written that explore this new functionality. It will even auto-vivify as needed:

my $foo;
push $foo, @stuff; # $foo is now an arrayref

I'm still working on keys, values and each and those look much harder, but I hope to have something to share by next week.

Updated: I added example data to the "given $data..." comments to clarify that it's an example, not a recommendation about how to initialize a data structure.

This entry was posted in perl programming. Bookmark the permalink. Both comments and trackbacks are currently closed.

17 Comments

  1. Posted September 8, 2010 at 7:25 pm | Permalink

    Hi

    Are you sure you want this? I don't.
    That is - I'd like to know I can turn this 'feature' on and off, and hence off.
    And why is that?
    Because I rely on my code failing with an error msg if I push onto something which I have not previously initialized explicitly as an array or array ref. In other words my gut reaction is that this looks suspiciously like a bad idea 'which seemed like a good idea at the time'.
    Also, if it were optional, how would it tie in with common::sense?
    Things to think about :-).
    Cheers
    PS The vertical text in the captcha is almost unreadable.

    • Posted September 8, 2010 at 10:29 pm | Permalink

      Perl already does autovivification if you explicitly dereference an undefined value: my $foo; push @$foo, @stuff;

      If you really don't like that behavior (and I respect that point of view), you might be interested in Vincent Pit's autovivification module, which lets you say no autovivification to disable it in a lexical scope.

  2. Michael Peters
    Posted September 8, 2010 at 8:20 pm | Permalink

    Ron, it's no different than the way hashes auto-vivify now. In fact this is now more consistent since it's not just hashes that get the special treatment. Also, if you don't like the feature, don't use it. If you use the same syntax we have now (explicitly dereferencing arrayrefs as arrays) then it will behave as you intend and you'll get an error about something not being an array.

    But I'm all in favor of this. Dereferencing references has always been an ugly spot in Perl's syntax and this goes a long want to get rid of code that's unnecessary.

  3. Posted September 8, 2010 at 10:41 pm | Permalink

    Doesn't autobox give you something like

    $data->{$key1}{$key2}->push(@stuff);

    ?

    • Posted September 8, 2010 at 10:55 pm | Permalink

      Yes, with the overhead of loading autobox and doing method resolution and dispatch. My work modifies the op-tree at compile time so that push $foo, @stuff winds up with the exact same ops as the more explicit push @$foo, @stuff. There is no additional overhead.

      autobox is a great tool, but it's also a general purpose one. My work is a very focused. The two can (and should) co-exist for different needs.

  4. Tom Davis
    Posted September 8, 2010 at 11:21 pm | Permalink

    I think Ron's reaction might have been somewhat similar to my own, as a result of the example:
    push $foo, @stuff

    The problem with that is that some of us sometimes type exactly that code when we really mean to push $foo onto @stuff. My first reaction was exactly Ron's, until I realized that in those cases $foo will never be unintialized, so everything will work as needed for everyone so long as trying to push onto an initialized scalar will die, but pushing onto an array ref will succeed and pushing onto an unitialized scalar will autovivify an array ref.

  5. aero
    Posted September 8, 2010 at 11:45 pm | Permalink

    Rather confusing.

    Assuming $data->{$key1}{$key2} = []
    push @{ $data->{$key1}{$key2} }, @stuff;

    What about this ?
    $data->{$key1}{$key2} = [ @stuff ];

    • Posted September 9, 2010 at 6:28 am | Permalink

      My apologies for the bad example. I just meant to say that given some deep data structure with an arrayref at the end, you can push onto it without an explicit dereference. It could have been $data->{$key1}{$key2} = [ qw/a b c/ ]; push $data->{$key1}{$key2}, qw/d e f/;

      In fact, I'll go change it now

  6. Darko
    Posted September 9, 2010 at 6:18 am | Permalink

    This would be a great benefit to perl source code readability imho.
    I hope it will make its way to the core some day.

    @aero:

    for initializations of arrays, push isn't a good option anyway, just for on-the-fly init in loops or the like.

  7. aero
    Posted September 9, 2010 at 6:56 am | Permalink

    Assuming $data->{$key1}{$key2} is hashref.
    my ($k,$v) = each %{ $data->{$key1}{$key2} }
    also should be modified? to be able to do like this
    my ($k,$v) = each $data->{$key1}{$key2}

    • Posted September 9, 2010 at 7:48 am | Permalink

      I'm exploring whether that is possible. The internals of each() are much more complicated, in part due to the fact that each/keys/values now support arrays as well as hashes.

  8. brian d foy
    Posted September 9, 2010 at 7:02 am | Permalink

    Don't forget tests for the cases where the hash value is a non-reference, or a reference of a different sort. How are you going to report the errors in those cases, etc? I've found that in new features the perl tests tend to forget many edge cases. :(

    • Posted September 9, 2010 at 7:55 am | Permalink

      As I replied to theory, internally, the push/pop work rewrites the op tree to make push $foo, @stuff equivalent to push @{ $foo }, @stuff. So at that point, all the same protections for an invalid $foo are in place.

      Doing the same smart dereferencing for each, keys and values looks to be hard because it can't be done just with the op-tree (unless this feature were limited only to hashrefs, which would be confusing and suboptimal). Getting all the edge and corner cases right is tricky. (E.g. a blessed scalar reference that overloads %{})

  9. Jay
    Posted September 9, 2010 at 7:54 am | Permalink

    Hopefully this functionality will require 'use 5.14;' (or whatever version this might appear in), so people using 5.12 and earlier don't get bit by un-constrained use of this (not really necessary) feature in modules and code that doesn't otherwise express its perl version requirements. Unfortunately, that doesn't seem a priority, e.g. see each / keys /values supporting arrays.

    • Posted September 9, 2010 at 8:01 am | Permalink

      It would be a syntax error on previous versions of perl, so it does not require protection by feature.pm (whether or not it would be enabled by 'use 5.14'). I agree with the practice of listing the minimum version of perl supported in modules. I recommend using perlver to detect minimum versions based on syntax. I use Dist::Zilla::Plugin::MinimumPerl in my release tools to automate such checks.

  10. Ilya Skorik
    Posted September 10, 2010 at 2:43 am | Permalink

    The happiness will come when it will be possible to get rid of brackets in the code.

    For example instead of

    if (an eq b) {
    c=d;
    }

    It will be possible to write

    if an eq b then c=d;

  11. abraxxa
    Posted September 14, 2010 at 5:44 pm | Permalink

    I've often wondered why perl isn't smart enough to do that.
    Hopefully it will be accepted for 5.14!

3 Trackbacks

© 2009-2014 David Golden All Rights Reserved