Perl QA Hackathon 2014 Report

tl;dr → At the 2014 Perl QA hackathon in Lyon, I worked on PAUSE, Module::Metadata, version number semantics, Test::Harness, CPAN.pm, CPAN::Reporter, Dist::Zilla and more.

Why do I love the QA hackathon?

As I mentioned in my TPF grant application, the QA hackathon allows me to work in a concentrated way for several days on parts of the Perl/CPAN toolchain and testing infrastructure that are "too big" for ad hoc development during the year. It also gives me an opportunity for face-to-face collaboration with other toolchain/quality hackers, which means getting answers, insights and making decisions much faster than happens over email, IRC, or ticket trackers.

The QA hackathon is like the best conference "hallway track" mashed up with a coding marathon with some of the most incredibly talented hackers in the Perl community.

Hacking at the QAH

What was different? What was the same? What worked? What didn't?

The big difference this year is that the organizers wisely shrunk the size of the event back to the 30-ish number that had been typical for most of the early years of the hackathon. At that number of attendees, everyone can pretty much know what everyone else is working on and basic logistics take up much less time.

There were several new faces, including Karen Etheridge, Graham ("one-p") Knop, and Neil Bowers. And there were a lot of familiar faces, including many that I only see once a year at the hackathon.

In addition to the smaller group, I really appreciated how much the organizers optimized for productive time. Breakfast and lunch were provided at the venue, and there was only a single organized dinner out and another organized dinner at the hotel. That meant less time in transit for food and more time to get stuff done.

Sadly, as in some previous years, the network was flaky, both at the hotel and the conference venue, which is always a distraction and occasional barrier to getting work done.

Giving thanks

Before I give the day-by-day recap, I want remind readers that each year, the QA hackathon happens because of the dedicated volunteer work of the organizers and the financial support of sponsors.

I offer many thanks to Philippe Bruhat and Laurent Boivin for putting together an excellent hackathon, to Booking.com for providing our venue, and to Wendy van Dijk for helping each day with critical logistics: ensuring we did not lack for food or drink! I also particularly want to thank the The Perl Foundation for the travel grant that allowed me attend.

All our sponsors deserve great thanks! These companies are putting their money where their mouth is to make the Perl ecosystem better for everyone: Booking.com, SPLIO, Grant Street Group, DYN, Campus Explorer, EVOZON, elasticsearch, Eligo, Mongueurs de Perl, WenZPerl for the Perl6 Community, PROCURA and Made In Love.

If you think the QA hackathon work is valuable for the Perl community, please consider making a late contribution to the hackathon fund to support the 2015 QA Hackathon.

Thank you, also, to my fellow participants. You're the reason I keep going back.

Day-by-day Recap

I'm going to give a pretty detailed, stream-of-consciousness replay, because I think it will give readers some insight into the frenetic way the QA hackathon tends to work. It's rare (at least for me) to be working on just one project for very long. Frequently I jumped back and forth between discussions with people and actually coding.

Day 0

Ricardo Signes and I flew together and arrived in Lyon in the afternoon with the usual red-eye flight exhaustion. We met up with Karen and Barbie at the airport for the ride to the hotel. We met up with most of the rest of the hackers that night for drinks and dinner and started to swap ideas about what we might be working on.

Barbie, Ricardo and Tux

Day 1

On Thursday, after introductions and the "stand-up" where we each talked about our plans, I pulled together a bunch of people to talk about PAUSE issues and tasks. One of the big topics was how to implement some of the decisions taken in the Lancaster Consensus the previous year. We also talked about how to get stricter about case-sensitivity, to avoid the "ElasticSearch renaming" problem.

Another related topic was separating assigning permissions from indexing modules, so that someone could get permissions on the namespace of a module while still releasing non-indexed developer versions of it.

Later that day, I summarized all the discussion into a PAUSE distribution permissions and indexing rules document. The big change is that — per Lancaster Consensus — your distribution "name" (the first part of the tarball filename) will need to match a Perl package that you have upload permissions for.

Ricardo and others then went off to implement various parts of this and solve the problem of existing distributions that don't match a package and I'll let him/others cover that in other blog posts. (Update: see Ricardo's blog post)

Next, I talked to Karen and Graham about improving the security of Module::Metadata (which has to evaluate code to determine $VERSION) so that perhaps it could eventually be used by PAUSE. They took some prototype work I already had and started running with it, checking with Christian Walde about how to handle sub-process issues portably to Windows.

At other points during the day, I sent Tim Bunce some ideas for how role-based testing might help DBI testing and helped Jens Rehsack with a warnocked takeover request for some of Adam Kennedy's modules.

I also took a moment to get everyone's attention so that I could hand out a special award (only partly in jest) to Peter Rabbitson for his efforts keeping backwards compatibility for Perl 5.6. I called it the Wandering Albatross Award and gave him a stuffed albatross. It was also Peter's birthday, so I got us to sing to him. He took it all well and the albatross kept him company for the rest of the hackathon.

Peter and the albatross

With all these discussions going on, it wasn't until after dinner that I got to any of my own coding, but I managed to do a couple cool things before bedtime:

  • Revived a patch for Test::Harness to let authors define rules for parallel testing via a file in their distributions (more in Day 2)
  • Sent a CPAN.pm pull request with a configuration option to automatically switch on PERL_MM_USE_DEFAULT for prompt-free installation.

Day 2

On Friday, I finished my work on a test rules file for Test::Harness and fired off a pull request for it. The problem was that some people are setting parallel testing by default to speed up module installation. This usually works, but some distributions have tests that won't work in parallel. Fixing a test suite like that is a lot of work, but instead, they'll be able to add a testrules.yml file that specifies their tests don't work in parallel and need to be run in series. The rules code for this was always in TAP::Harness, but there was no way for authors to control it short of customizing ExtUtils::MakeMaker or Module::Build. Now there will be.

Jérôme Quelin stopped by to discuss some cpan -O bugs finding outdated modules. There were two cases in which apparently identical versions between the locally-installed module and CPAN were being reported as out of date. One of those turned out to be a bug in the decimal precision of the report and I sent Jerome to file a ticket on App::Cpan. The other turned out to be a bug in how CPAN.pm compared "undef" and "0" and I send Andreas a pull request to fix it.

I then worked on my own pull request backlog for a bit to get things shipped:

  • Test::API patches for fewer dependencies and a class-api test
  • Term::Title fixes for non-interactive testing
  • CPAN::Reporter::Smoker patch to skip dev distributions

All that work got me to dinner time and I went with a small group (Ricardo, Peter, Karen, Graham, and Leon) to have a dinner discussion about various issues and challenges relating to version objects and the toolchain. We decided to momentarily set aside the current state of affairs with respect to version objects and just talk about the different ways that versions are represented in Perl (decimal and tuple forms). We sketched out our ideal semantics and transformations and I wrote it up after we got back to the hotel. (Leon would later do some prototyping on Saturday and Sunday).

We'll continue the discussion virtually over the next several months and see if it leads to a concrete proposal for how to rationalize version number semantics in perl 5.21.

Day 3

My plan for Saturday was to dive deep into CPAN.pm. Earlier this year, Andreas discovered a major regression in the "force" pragma for the CPAN shell stemming from my work in Lancaster last year. As an emergency fix, he reverted a dozen or so commits. My goal was to try to recover the reversions, while squashing all the bugs.

Unfortunately, shit happens. The CPAN Testers Metabase (which collects test report submissions) chose that morning to stop working. I discovered that the EC2 instance it was on had gotten wedged, and in the kind of way that EC2's usual "stop" command wouldn't even work. So I spent the rest of the morning in EC2 hackery to get Metabase back up and the CPAN Testers reports flowing again.

It was ironic that Metabase died when I was at the hackathon, and yet the hackathon gave me all the round tuits I needed to get it fixed.

With the Metabase repaired, I got back to CPAN.pm. I covered my bases: first I sent Andreas a pull request to revert something else that needed reverting if the reversions were going to stand, then I branched off before the reversions and figured out how to fix the 'force' pragma bug directly.

Along the way, I stopped to fix CPAN::Reporter's prereqs reporting when used with a CPAN.pm that supports recommends/suggests prerequisites.

And throughout the day, I continued the version numbering semantics discussions from the previous evening to test, clarify and refine our understanding.

In the evening, I spent some hours trying and failing to replicate another bug that Andreas had described.

Day 4

Sunday morning started out with a snag — we were unable to get into the Booking.com office and had to work from the hotel. I took advantage of the time to get Andreas to walk me in great detail through the CPAN.pm bug I couldn't replicate. Thank goodness he keeps copious notes! By the time we got to the hackathon venue, I had a good hypothesis for what I needed to do to replicate it.

With only one afternoon to go, I tried to avoid further discussions and focus on code:

  • I implemented metadata fragment conversion in CPAN::Meta::Converter, the lack of which was blocking Leon's CPAN::Meta::Merge pull request. The I fixed up CPAN::Meta::Merge to use the new feature the way I implemented it. Along the way, I roped Karen into writing some additional tests for fragment conversion and generally sanity checking my code. CPAN::Meta::Merge will make it much easier for distribution packagers to safely and sanely create META files from a mix of detected and provided metadata.
  • I cleaned up CPAN::Reporter's repository, made its tests more efficient and shipped it
  • When working on CPAN::Reporter, I wished there was an easy way to run "dzil test" with parallel testing, so I implemented that and sent Ricardo a pull request. (He made it even more general and shipped a new Dist::Zilla within the hour.)

All this got me wondering why dzil build ran so slowly on CPAN::Reporter and someone suggested I try running PERL5OPT=-d:NYTProf dzil build and looking at the flame graph of the result:

CPAN::Reporter flame graph

(note: this image is a reconstruction *after* optimization, but it gives you the idea)

Ricardo and a bunch of people looked over my shoulder as we analyzed it and we realized how terribly slow the PerlMinimum plugin was. I also realized my own InsertCopyright plugin wasn't using the new PPI caching mechanism. So I swapped out PerlMinimum for PerlMinimumFast and patched InsertCopyright. That cut my build time by about 40%.

We also realized that the biggest subroutines were PPI "find" ones, and that inspired Ricardo to think about ways of indexing the PPI DOM for more efficient queries.

I then reviewed the Module::Metadata work that Karen and Graham had been working on and gave it a thumbs up but for some minor comments, and looked at some other pull requests that had been flying around.

We stopped for clean up, group pictures, and headed back to the hotel for dinner.

After dinner, I finally got a chance to finish my CPAN.pm work and had a clean branch that avoided reverting recommends/suggests support, fixed the force pragma bug, avoided the other bug that Andreas showed me, and cherry-picked half a dozen commits that had come in after the reversion (including three of my own from the hackathon).

Day 5

Monday was a travel day. While on the plane, I worked on cleaning up all my CPAN.pm work to be less confusing when I sent it to Andreas.

I also worked on a way of reporting deep dependencies during automated testing, so that Andreas' analysis service can more easily detect when test failures are due to a deep dependency. It's not done, but I hope to make it a standard part of CPAN testing sometime "soon" (i.e. before next year's hackathon).

And that was it. I got home, took a shower, talked to my wife, and fell into bed exhausted.

The 2014 Perl QA hackathon was over.

Posted in perl programming | Tagged , , , , , , , , | Comments closed

Why I finally joined Gittip and why you should, too

I was a Gittip skeptic.

Heck, I still am. But I signed up anyway and I'll tell you why.

But first, I want to talk about money and altruism. Most people contribute to open source for free. They don't do it for money. They do it for fun or self-satisfaction.

The danger of offering money to a volunteer is that they might revalue their contributions in light of the money. Put differently, it's possibly that getting a little bit of money might be more demotivating than none.

Consider shareware as a related example. You bust your ass writing some software and then typically find (a) few people download it and (b) even fewer bother to pay.

Now think about that from the open source "tip jar" perspective. For one, lots of work doesn't even have a download count. And given that it is typically "free" (in both the "speech" and "beer" sense), I would expect New York City Subway "Showtime" Panhandlers to get more in tips than the typical open source developer.

The fallacy in that argument is that shareware – and subway panhandling – is transactional. It's cash (or not) for a product or performance.

Open source is a community.

Or possibly, it's like an iterated prisoners-dilemma game.

In a community, like in an iterated game, you participate over time and your self-reward can be reinforced or diminished a little bit in every interaction.

If you're totally self-motivated and only the challenge of the code matters to you, then community isn't a big deal. But if you're like most of us, positive feedback from the community, whether karma points, or "+1" clicks or thanks, or tips, all contribute to the feeling of self-reward.

For me, someone recognizing my efforts is a huge boost. Even a bug report tells me that someone used my code and it helped them enough that they'd try to make it better. That motivates me to fix more bugs and write more code.

Recently, Ribasushi argued that Gittip makes consistent reward easy and that even chump change adds up over time. Ovid argued that just raising Perl's visibility on Gittip benefits the community.

To some extent I agree and to some extent I think both are missing the larger point. It's not about the money and it's not about the marketing.

In a community, everyone should be looking for ways to reinforce behaviors that improve the community. Thus, every extra way to say thanks is worth pursuing.

Thank with email? Patches? Gittip? Flattr? Awards? Karma points?

Yes!

Whatever methods you find motivating to thank others are the ones you should use.

  • If you have more free time than mad money, find ways to produce things the community needs. Write code, write articles, give talks, answer questions on Q&A sites and so on
  • If you have more mad money than free time, find ways to support those producing. Donate to TPF or EPO. Or, if you want to make your support personal, donate via Gittip or something similar

Either way, try to make sure your efforts are also reinforcing those around you. Sending bug reports and patches or even just a thank-you email benefit the recipient much more than you might think and might even do more good for the community than your own next bit of code or authorship. Sending a gittip or saying "+1" to a grant proposal are ways to give thanks with currency.

Thanks comes in many forms. The more we have in any form, the better off we all are.

Maybe Gittip is a flash in the pan. But maybe not. If that kind of personal, consistent thank you appeals to you – whether as a donor or as a recipient – don't think about it, just do it. More is better.

Join Gittip here. Join the Perl community here. Find your favorite CPAN authors on CPAN Tip. And if you want to gittip me, here I am.

Or just shoot me an email sometime. ☺

Posted in perl programming | Tagged , , | Comments closed

Why you should use getcwd and not cwd

The Cwd module provides several functions for finding the current directory. The most similar-seeming are cwd and getcwd. Have you ever wondered why you should pick one or the other?

I always use getcwd and I'll show you why.

My ~/.dzil directory is a symlink to a git repository elsewhere. If I'm in that directory in my terminal, here is a look at three ways to get the current path:

$ perl -MCwd=cwd,getcwd -MFile::Spec -wE 'say for cwd(), getcwd(), File::Spec->rel2abs(".")'
/Users/david/.dzil
/Users/david/git/dotfiles/dzil
/Users/david/git/dotfiles/dzil

The cwd call returns the symlink path — the way the shell sees it. The getcwd call returns the real path with the symlink resolved.

Now look at the third. That's from File::Spec. A LOT of code uses File::Spec to manipulate paths. If you ever want to compare the current directory against a path made absolute by File::Spec, you need to use getcwd.

I've found that getcwd is more consistent across platforms, whereas cwd can be implemented differently depending on your platform or if you have XS or pure-perl implementations.

I like consistency.

Sure, there are cases where the "shell view" of the current directory is more important and you might want to use cwd, but I find that is the exception, not the rule.

Consistency matters. Use getcwd.

Posted in perl programming | Tagged , , | Comments closed

Why installing Dist::Zilla is slow and what you can do about it

Despite my previous rant about Dist::Zilla haters and why you don't need Dist::Zilla to contribute, I recognize that there is one thing that does require Dist::Zilla: installing from a patched repo without waiting for a CPAN release.

Leaving aside whether that's really wise or not, I think it's the real frustration people are having with distributions that use Dist::Zilla.

That inspired me to explore why Dist::Zilla is slow to install and what could be done to improve it.

First and foremost, Dist::Zilla just has a lot of dependencies — over 170 of them. Downloading, untarring, building, testing and installing those takes time. Starting from a fresh Perl, if every distribution took only a second to install, it would still take nearly 3 minutes. Unfortunately, distributions aren't that quick to install. Some are damn slow.

My first experiment was finding out how long it took to install Dist::Zilla from the worst case sitution — a brand new perl installation.

I started with two cases:

  1. Installing with cpanminus, but using TAP::Harness::Restricted to avoid pod-related tests (which might otherwise cause non-functional test failures and prevent installation)
  2. Installing with cpanminus, but using the "-n" flag to skip all tests

In each case, starting from a clean perlbrew, I set up a local library to install modules. Then I bootstrapped cpanminus and (for #1), TAP::Harness::Restricted:

$ perlbrew lib create 18.2@case1
$ perlbrew use 18.2@case1
$ cpan App::cpanminus
$ cpanm TAP::Harness::Restricted

I created a similar, empty local library for case #2.

Installing TAP::Harness::Restricted in case #1 installs some distributions that Dist::Zilla deps also need, but I didn't include the time of that in my analysis. The majority of it is installing Capture::Tiny, which I timed separately as requiring ~ 40 seconds to install due to the heavy testing it does.

Testing was done like this:

# case #1
$ HARNESS_CLASS=TAP::Harness::Restricted time cpanm Dist::Zilla

# case #2
$ time cpanm -n Dist::Zilla

One thing I realized later (but will describe here) is that cpanminus installs META file information into the archlib path. I was curious how much overhead that added, so I added a third case (also with a clean local library): installing using CPAN.pm with TAP::Harness::Restricted.

To keep that from hanging in the middle of the run, I had to run it enabling default answers to prompts:

# case 3
$ PERL_MM_USE_DEFAULT=1 HARNESS_CLASS=TAP::Harness::Restricted time cpan Dist::Zilla

The results:

  • Case 1: ~16 minutes (cpanminus + TAP::Harness::Restricted)
  • Case 2: ~11 minutes (cpanminus without running tests)
  • Case 3: ~12 minutes (CPAN.pm + TAP::Harness::Restricted)

That was surprising! Comparing #1 and #3, cpanminus writing META files looks like it has about the same overhead as running tests in the first place. If cpanminus didn't do that, then case #2 might drop down to maybe 7 or 8 minutes. That would average around 3 seconds over the 170 dependencies, which seems plausible.

[Update: Miyagawa pointed out that I'm assuming that writing META is the cause of the slowdown and he's right. I suspect that it is a large part of it (it hits disk and executes a separate process), but there might be other reasons as well.]

That was the macro picture. Next I wanted to see how long individual distributions took to install so that I could see which ones were causing the biggest delay.

To profile installation timings, I hacked some timing output into cpanminus and then re-ran case #1. Not surprisingly, a handful of distributions were a huge chunk of the installation time.

The number after the distribution in the list below is the number of exclusive seconds required to download, unpack, configure, build, test and install (cpanminus' writing of META is excluded):

Moose-2.1202: 123
Module-Build-0.4204: 63
Dist-Zilla-5.012: 51
IO-Socket-SSL-1.966: 39
Capture-Tiny-0.23: 39
PPI-1.215: 26
DateTime-TimeZone-1.63: 24
File-Temp-0.2304: 21
DateTime-1.06: 21
Test-Harness-3.30: 16
DateTime-Locale-0.45: 16
MooseX-Role-Parameterized-1.02: 9
Net-SSLeay-1.58: 9
Test-Warn-0.24: 9
libwww-perl-6.05: 9
Test-Simple-1.001002: 7
Config-MVP-2.200006: 7
JSON-2.90: 7
Moose-Autobox-0.15: 6

In some cases, it looks like newer versions of dual-life core distributions are being pulled in when they might not need to be.

For example, Test::File::ShareDir requires a newer Module::Build than ships with Perl v5.18.2 for configuration, but doesn't seem (at first glance) to use any of its features. Switching to ExtUtils::MakeMaker would shave 8% or so off Dist::Zilla's worst-case installation time (assuming tests are run).

Likewise, Tree::DAG_Node requires a very new File::Temp for testing. Is that really necessary? Maybe not.

Of course, these are worst case results. In many real-world cases, you might already have Moose, LWP, DateTime and other modules installed and the installation burden will be less.

So what should you do if you need to install Dist::Zilla?

If you like tests, install TAP::Harness::Restricted and use CPAN.pm like this:

$ cpan TAP::Harness::Restricted
$ PERL_MM_USE_DEFAULT=1 HARNESS_CLASS=TAP::Harness::Restricted cpan Dist::Zilla

If you don't mind installing things without tests, use cpanminus like this:

$ cpanm -n Dist::Zilla

In either case, it's probably going to take about 10 minutes.

Go for a walk, go get a cup of your favorite beverage, take a bathroom break, or whatever. When you get back, Dist::Zilla should be ready for you.

If you really can't wait because $job depends on the fix, you can always just patch a tarball from CPAN instead of the repo
Despite the complaint that Dist::Zilla requires "half of CPAN", that's actually only about 0.6% of the nearly 30k distributions on CPAN
Because capturing output portably can break in so many ways
Posted in dzil, perl programming, toolchain | Tagged , , , | Comments closed

Dist::Zilla haters, stop your whining

Some people just love to hate. And some of them love to blog their hate.

Dist::Zilla seems to rub some people wrong way. Here are some of the typical complaints I've seen or heard:

  • It's good for authors but not contributors
  • I have to install half of CPAN to contribute
  • There's no Makefile.PL or Build.PL in the code repository
  • I can't install it from github

Well, sure. It is good for authors.

It was written by Ricardo Signes (RJBS), who is possibly the most prolific CPAN author to date. According to the CPAN Report, Ricardo released 230 distributions in 2013. Oh, and did I mention that he is the Perl Pumpking, too?

If you look at heavy Dist::Zilla users, you'll find a who's who of very active and involved CPAN contributors. These are people who spend a lot of time publishing code for the benefit of the broader Perl community.

So here's my problem with whining about how their use of Dist::Zilla makes it hard to contribute:

You're telling some extremely prolific CPAN contributors to be less productive for your convenience.

That's asinine!

You ought to be thanking them for finding a tool that lets them give so much of their time to the Perl community. You ought to be bending over backwards to do it their way, even if that means a few extra minutes of your time.

You sure as hell shouldn't be wasting any of their time or morale complaining about how they manage their code.

That said, there are ways to mitigate Dist::Zilla contributor-shock and I've been encouraging Dist::Zilla users to make such changes. One huge help is providing better documentation for how to contribute.

Here's all it takes for most of my own distributions (note, no Dist::Zilla required):

    $ git clone git://github.com/dagolden/...whatever...
    $ cd whatever
    $ cpanm --installdeps .
    # hack, hack, hack
    $ prove -l

If that's too hard for you, I'm not sure I want your contributions anyway.

Maybe bitching about Dist::Zilla will make some potential new adopters think twice. Or maybe not.

Do you think people would rather listen to the guy releasing 230 distributions a year to CPAN or to the guy complaining about how he did it?

Posted in dzil, perl programming | Tagged , , , | Comments closed

© 2009-2014 David Golden All Rights Reserved