Why installing Dist::Zilla is slow and what you can do about it

Despite my previous rant about Dist::Zilla haters and why you don't need Dist::Zilla to contribute, I recognize that there is one thing that does require Dist::Zilla: installing from a patched repo without waiting for a CPAN release.

Leaving aside whether that's really wise or not, I think it's the real frustration people are having with distributions that use Dist::Zilla.

That inspired me to explore why Dist::Zilla is slow to install and what could be done to improve it.

First and foremost, Dist::Zilla just has a lot of dependencies — over 170 of them. Downloading, untarring, building, testing and installing those takes time. Starting from a fresh Perl, if every distribution took only a second to install, it would still take nearly 3 minutes. Unfortunately, distributions aren't that quick to install. Some are damn slow.

My first experiment was finding out how long it took to install Dist::Zilla from the worst case sitution — a brand new perl installation.

I started with two cases:

  1. Installing with cpanminus, but using TAP::Harness::Restricted to avoid pod-related tests (which might otherwise cause non-functional test failures and prevent installation)
  2. Installing with cpanminus, but using the "-n" flag to skip all tests

In each case, starting from a clean perlbrew, I set up a local library to install modules. Then I bootstrapped cpanminus and (for #1), TAP::Harness::Restricted:

$ perlbrew lib create 18.2@case1
$ perlbrew use 18.2@case1
$ cpan App::cpanminus
$ cpanm TAP::Harness::Restricted

I created a similar, empty local library for case #2.

Installing TAP::Harness::Restricted in case #1 installs some distributions that Dist::Zilla deps also need, but I didn't include the time of that in my analysis. The majority of it is installing Capture::Tiny, which I timed separately as requiring ~ 40 seconds to install due to the heavy testing it does.

Testing was done like this:

# case #1
$ HARNESS_CLASS=TAP::Harness::Restricted time cpanm Dist::Zilla

# case #2
$ time cpanm -n Dist::Zilla

One thing I realized later (but will describe here) is that cpanminus installs META file information into the archlib path. I was curious how much overhead that added, so I added a third case (also with a clean local library): installing using CPAN.pm with TAP::Harness::Restricted.

To keep that from hanging in the middle of the run, I had to run it enabling default answers to prompts:

# case 3
$ PERL_MM_USE_DEFAULT=1 HARNESS_CLASS=TAP::Harness::Restricted time cpan Dist::Zilla

The results:

  • Case 1: ~16 minutes (cpanminus + TAP::Harness::Restricted)
  • Case 2: ~11 minutes (cpanminus without running tests)
  • Case 3: ~12 minutes (CPAN.pm + TAP::Harness::Restricted)

That was surprising! Comparing #1 and #3, cpanminus writing META files looks like it has about the same overhead as running tests in the first place. If cpanminus didn't do that, then case #2 might drop down to maybe 7 or 8 minutes. That would average around 3 seconds over the 170 dependencies, which seems plausible.

[Update: Miyagawa pointed out that I'm assuming that writing META is the cause of the slowdown and he's right. I suspect that it is a large part of it (it hits disk and executes a separate process), but there might be other reasons as well.]

That was the macro picture. Next I wanted to see how long individual distributions took to install so that I could see which ones were causing the biggest delay.

To profile installation timings, I hacked some timing output into cpanminus and then re-ran case #1. Not surprisingly, a handful of distributions were a huge chunk of the installation time.

The number after the distribution in the list below is the number of exclusive seconds required to download, unpack, configure, build, test and install (cpanminus' writing of META is excluded):

Moose-2.1202: 123
Module-Build-0.4204: 63
Dist-Zilla-5.012: 51
IO-Socket-SSL-1.966: 39
Capture-Tiny-0.23: 39
PPI-1.215: 26
DateTime-TimeZone-1.63: 24
File-Temp-0.2304: 21
DateTime-1.06: 21
Test-Harness-3.30: 16
DateTime-Locale-0.45: 16
MooseX-Role-Parameterized-1.02: 9
Net-SSLeay-1.58: 9
Test-Warn-0.24: 9
libwww-perl-6.05: 9
Test-Simple-1.001002: 7
Config-MVP-2.200006: 7
JSON-2.90: 7
Moose-Autobox-0.15: 6

In some cases, it looks like newer versions of dual-life core distributions are being pulled in when they might not need to be.

For example, Test::File::ShareDir requires a newer Module::Build than ships with Perl v5.18.2 for configuration, but doesn't seem (at first glance) to use any of its features. Switching to ExtUtils::MakeMaker would shave 8% or so off Dist::Zilla's worst-case installation time (assuming tests are run).

Likewise, Tree::DAG_Node requires a very new File::Temp for testing. Is that really necessary? Maybe not.

Of course, these are worst case results. In many real-world cases, you might already have Moose, LWP, DateTime and other modules installed and the installation burden will be less.

So what should you do if you need to install Dist::Zilla?

If you like tests, install TAP::Harness::Restricted and use CPAN.pm like this:

$ cpan TAP::Harness::Restricted
$ PERL_MM_USE_DEFAULT=1 HARNESS_CLASS=TAP::Harness::Restricted cpan Dist::Zilla

If you don't mind installing things without tests, use cpanminus like this:

$ cpanm -n Dist::Zilla

In either case, it's probably going to take about 10 minutes.

Go for a walk, go get a cup of your favorite beverage, take a bathroom break, or whatever. When you get back, Dist::Zilla should be ready for you.

If you really can't wait because $job depends on the fix, you can always just patch a tarball from CPAN instead of the repo
Despite the complaint that Dist::Zilla requires "half of CPAN", that's actually only about 0.6% of the nearly 30k distributions on CPAN
Because capturing output portably can break in so many ways
Posted in dzil, perl programming, toolchain | Tagged , , , | Comments closed

Dist::Zilla haters, stop your whining

Some people just love to hate. And some of them love to blog their hate.

Dist::Zilla seems to rub some people wrong way. Here are some of the typical complaints I've seen or heard:

  • It's good for authors but not contributors
  • I have to install half of CPAN to contribute
  • There's no Makefile.PL or Build.PL in the code repository
  • I can't install it from github

Well, sure. It is good for authors.

It was written by Ricardo Signes (RJBS), who is possibly the most prolific CPAN author to date. According to the CPAN Report, Ricardo released 230 distributions in 2013. Oh, and did I mention that he is the Perl Pumpking, too?

If you look at heavy Dist::Zilla users, you'll find a who's who of very active and involved CPAN contributors. These are people who spend a lot of time publishing code for the benefit of the broader Perl community.

So here's my problem with whining about how their use of Dist::Zilla makes it hard to contribute:

You're telling some extremely prolific CPAN contributors to be less productive for your convenience.

That's asinine!

You ought to be thanking them for finding a tool that lets them give so much of their time to the Perl community. You ought to be bending over backwards to do it their way, even if that means a few extra minutes of your time.

You sure as hell shouldn't be wasting any of their time or morale complaining about how they manage their code.

That said, there are ways to mitigate Dist::Zilla contributor-shock and I've been encouraging Dist::Zilla users to make such changes. One huge help is providing better documentation for how to contribute.

Here's all it takes for most of my own distributions (note, no Dist::Zilla required):

    $ git clone git://github.com/dagolden/...whatever...
    $ cd whatever
    $ cpanm --installdeps .
    # hack, hack, hack
    $ prove -l

If that's too hard for you, I'm not sure I want your contributions anyway.

Maybe bitching about Dist::Zilla will make some potential new adopters think twice. Or maybe not.

Do you think people would rather listen to the guy releasing 230 distributions a year to CPAN or to the guy complaining about how he did it?

Posted in dzil, perl programming | Tagged , , , | Comments closed

Help test IO::Socket::IP for Perl v5.20

Do you want good IPv6 support in the Perl core?

The Perl 5 Porters intend to add IO::Socket::IP to the Perl 5 core for Version 20, coming later this year. IO::Socket::IP makes IPv4/IPv6 transparent networking easy.

It aims to be a drop-in replacement for IO::Socket::INET (with some caveats), so that most existing code merely needs to do s/IO::Socket::INET/IO::Socket::IP/ to gain IPv6 support.

Preliminary tests have been favorable, but P5P would like more testing to see how well it works as a drop-in replacement in real-world situations. You can help in one of two ways:

The hard, but good way

Take some networking code you've written and replace IO::Socket::INET with IO::Socket::IP.
If you find any problems, report them to the IO::Socket::IP bug queue.

This is the best test, but requires the most work from you.

The easy, but risky way

Install Acme::Override::INET from CPAN. This replaces your IO::Socket::INET with a thin wrapper around IO::Socket::IP.

THIS IS RISKY, because it affects every Perl program you run, so be sure you're willing to take the risk.

I've been running it for a while on my day-to-day Perl and haven't had any problems so far. Other Porters, including Ricardo Signes and Nicholas Clark are also using it.

If you find any problems, report them to the IO::Socket::IP bug queue.

This is super easy and fairly comprehensive since it affects everything you do. But you have to accept the risk of breakage.

[If you want to remove the override, you should be able to delete the modified IO::Socket::INET from your sitelib path and Perl will resume using IO::Socket::INET in your core library path.]

Mention that you're helping

If no one reports any bugs, does that mean that lots of people tried it and no one had problems? Or does it mean that no one bothered to try?

If you test IO::Socket::IP (either way above), then please add yourself to this ticket.

Thank you!

Posted in p5p, perl programming | Tagged , , , | Comments closed

The xdg channel — Thanksgiving missive

While I haven't been blogging much, I have been busy coding. To riff from Damian's "Conway Channel" talks, this blog post summarizes the various (mostly new) CPAN modules I've been working on.

::Tiny and not so ::Tiny

I appear to be one of the leading proponents of "::Tiny" modules. I love the Unix-like small-tools philosophy. Sometimes, though, they can be too tiny, and need extension for situations that need extra features and/or can handle more dependencies.

  • Class::Tiny is my response to the excessive minimalism of Object::Tiny. When you just need read-write accessors with lazy defaults and maybe BUILD/DEMOLISH, Class::Tiny gives it to you in about 120 lines of code.
  • HTTP::Tiny::UA extends HTTP::Tiny. HTTP::Tiny is in the Perl core and Christian and I consider it nearly feature-complete. I hope HTTP::Tiny::UA can become common ground for user-agent extensions that are consistent with the HTTP::Tiny philosophy and use HTTP::Tiny as the underlying transport.
  • Path::Tiny is not new, but it gets steady improvements. Lately, I've been sorting out Windows and volumes. One of these days, I hope to get around to tackling some big changes to file moving, copying and renaming (maybe by the QA hackathon next year).

Embellishing the Moose

Roles are one of the best features of Moose and Moo. I wrote two roles I thought worth sharing.

  • MooseX::Role::Logger provides a Log::Any-based logger. I think Log::Any is a great idea and underappreciated. I've taken over maintenance and hope to someday soon ship a new release that is even more flexible than it is today.
  • MooseX::Role::MongoDB provides an API for using MongoDB::MongoClient and associated databases/collections. It provides lazy-instantiation, caching and fork-safety.

A MongoDB Framework

You either love MongoDB or you hate it. Or both at the same time. MongoDB's document-centric data model is different than you're used to and everything I found on CPAN was too complex or was doing it wrong.

  • Meerkat is a framework that uses Moose objects as projections of the document state maintained in the database. I think it makes it easy use the right conceptual model in a Perl-ish way. Of course, it uses MooseX::Role::MongoDB under the hood.

Living with failure

Perl's poor excuse for an exception system is painful, so it falls to CPAN to provide improvements. Here are my latest two attempts to provide better tools.

  • failures makes creating and using exception classes extremely easy. Other than relying on Class::Tiny, it's implemented in about 70 lines of code.
  • Try::Tiny::Retry extends Try::Tiny to make it easy to retry a code block on error. It defaults to exponential-backoff, but is easily customizable.

CPAN minus archive equals index

Without an index, CPAN is just a distributed file store.

  • CPAN::Common::Index is a common library for accessing several types of CPAN indexes. I hope someday it will be something that CPAN clients will use.

Roooaarrrrrhhhh

If I didn't use Dist::Zilla, I couldn't possibly be as prolific as I am. So some fraction of my time is spent adding to the the Dist::Zilla ecosystem. In addition to helping make Dist::Zilla safe for encodings, I churned out a few new plugins.

Pod::Spell gets used by my Dist::Zilla spell checking plugins. I merged in the word list from Pod::Wordlist::Hanekomu, improved wordlist matching with Lingua::EN::Inflect and made some other algorithm improvements.

More for the core

I kept pushing some core modules forward in various ways, mostly just applying patches or fixing bugs.

  • HTTP::Tiny got some minor bug fixes
  • File::Temp got some dependency management and Travis CI smoking
  • CPAN::Meta got some fixes to validation and a couple new features

YAML::Tiny isn't really core, but it is the basis for CPAN::Meta::YAML, so I count it in the same category. Working with Ingy, Karen Etheridge and Jim Keenan, we fixed encoding, overhauled the test suite and added test coverage.

Code review

Inspired by rjbs's code-review practices, I've started gradually cleaning up and re-releasing old distributions of mine.

I for Incomplete

There are a number of other projects that I've started or just conceived that I haven't finished. They may yet see the light of day in the future.

  • A "tiny" URI module
  • A better benchmarking library, with statistical rigor for non-parametric timing distributions with unequal variance
  • Some extensions for Data::Faker
  • A module providing a standard way to safely evaluate $VERSION lines parsed from modules

What you can do

First, if any of these are interesting to you, please try them out and let me know what you think.

Second, if you're not in the habit of releasing code to CPAN, consider starting. When you write some library, take an extra second or two to think about how it could be generalized for others and ship it.

Give thanks for CPAN by giving back.

Posted in perl programming | Tagged , , | Comments closed

Dist::Zilla ♥ encoding

Last weekend I went to a Dist::Zilla "micro-hackathon" at Ricardo Signes' house. This is something we've done before, setting aside some focused time to tackle something tough. This time, we set out to fix the leaky encoding handling in Dist::Zilla and Ricardo explains the result well in his blog.

But what about Plugins?

I'm really happy that we finally fixed Dist::Zilla's handling of encoding, but unfortunately, Dist::Zilla is only as good as the ecosystem of plugins for it on CPAN. OK, really, it's good on its own but it's even better because of its ecosystem.

As Rik said, lots of things will just work better than before, particularly if you stick to UTF-8 for your files. The plugins most directly affected are FileGatherers and FileInjectors — anything that calls add_file — in particular anything that reads a file off disk without a particular encoding and then uses that as the content of a Dist::Zilla::File::* object.

For example, if you wrote a plugin that reads a template file, runs it through some other libraries, and stuffs the result into the content of a File object, then you really ought to sit down and think about whether you ought to be reading :raw or with some decoding layer. (You should probably shouldn't read with the default layers, which might do CRLF translation on Windows.) Whatever you decide, I recommend Path::Tiny and methods it offers like slurp_raw and slurp_utf8.

FileMungers are probably fine. Typically, they read content — which is now decoded text — then do something with it, and then stuff it back into content. Munging is text in and text out, and Dist::Zilla will take care of encoding it before writing it to disk.

Come out, come out, wherever you are…

Using grep.cpan.me, I tried to review all the FileInjectors and FileGatherers I could. Most of them look like they'll be unaffected by the changes in Dist::Zilla. But some looked suspicious and I'll give a list of them below.

If you wrote one of these, please review it for any changes you need to make for Dist::Zilla version 5 and release a -TRIAL version to CPAN. Follow the instructions at the bottom of Rik's post to get the -TRIAL version of Dist::Zilla and its dependencies.

If you use any of these, try them out with Dist::Zilla version 5 and see if it breaks anything for you. If so, let the author know right away.

To be clear, I'm not sure these need work, but they're doing things that concern me.

  • Dist::Zilla::Plugin::AssertOS (InMemory, raw slurp)
  • Dist::Zilla::Plugin::Author::Plicease::Init2 (mix of methods, including raw slurp)
  • Dist::Zilla::Plugin::CSS::Compressor (FromCode, raw slurp through CSS compressor)
  • Dist::Zilla::Plugin::Doppelgaenger (InMemory, raw slurp)
  • Dist::Zilla::Plugin::JSAN (InMemory, raw slurp, munged through other libraries)
  • Dist::Zilla::Plugin::LocaleTextDomain (FromCode, encoded content)
  • Dist::Zilla::Plugin::ManifestInRoot (FromCode, filename list)
  • Dist::Zilla::Plugin::Moz (InMemory, raw JAR content)
  • Dist::Zilla::Plugin::ShareDir::Tarball (InMemory, compressed tarball content)
  • Dist::Zilla::Plugin::TravisYML (InMemory, raw slurp)
  • Dist::Zilla::Plugin::TwitterBootstrap (InMemory, zip file member contents)
  • Dist::Zilla::Plugin::jQuery (InMemory, raw slurp)
  • Dist::Zilla::Role::ModuleIncluder (InMemory, raw slurp)
  • If you wrote or use a FileGatherer or FileInjector that is not on the list, that doesn't necessarily mean you're safe. It just means that a quick skim of your code didn't throw up any red flags.

    Micro-hackathon for the win

    If you've got some project that you're stuck on, I encourage you to grab a friend, set aside a day or two, and see if a micro-hackathon like ours can get you unstuck.

    In addition to getting Dist::Zilla fixed, I had a lot of fun. Whenever I get to sit down and work with Rik, I learn something new. This time, I feel like a lot of work I've been doing around encoding in the last year all came together in my head and made sense.

    Beyond that, I picked up a few editor, Moose and Mac tricks; got to visit most of Rik's favorite Bethlehem dives; tried saffron-almond ice cream; and learned several new board and card games. Woo hoo!

    Thank you to Rik (and his family) for a great weekend!

    Posted in dzil, perl programming | Tagged , , , | Comments closed

    © 2009-2015 David Golden All Rights Reserved