Why installing Dist::Zilla is slow and what you can do about it

Despite my previous rant about Dist::Zilla haters and why you don't need Dist::Zilla to contribute, I recognize that there is one thing that does require Dist::Zilla: installing from a patched repo without waiting for a CPAN release.

Leaving aside whether that's really wise or not, I think it's the real frustration people are having with distributions that use Dist::Zilla.

That inspired me to explore why Dist::Zilla is slow to install and what could be done to improve it.

First and foremost, Dist::Zilla just has a lot of dependencies — over 170 of them. Downloading, untarring, building, testing and installing those takes time. Starting from a fresh Perl, if every distribution took only a second to install, it would still take nearly 3 minutes. Unfortunately, distributions aren't that quick to install. Some are damn slow.

My first experiment was finding out how long it took to install Dist::Zilla from the worst case sitution — a brand new perl installation.

I started with two cases:

  1. Installing with cpanminus, but using TAP::Harness::Restricted to avoid pod-related tests (which might otherwise cause non-functional test failures and prevent installation)
  2. Installing with cpanminus, but using the "-n" flag to skip all tests

In each case, starting from a clean perlbrew, I set up a local library to install modules. Then I bootstrapped cpanminus and (for #1), TAP::Harness::Restricted:

$ perlbrew lib create 18.2@case1
$ perlbrew use 18.2@case1
$ cpan App::cpanminus
$ cpanm TAP::Harness::Restricted

I created a similar, empty local library for case #2.

Installing TAP::Harness::Restricted in case #1 installs some distributions that Dist::Zilla deps also need, but I didn't include the time of that in my analysis. The majority of it is installing Capture::Tiny, which I timed separately as requiring ~ 40 seconds to install due to the heavy testing it does.

Testing was done like this:

# case #1
$ HARNESS_CLASS=TAP::Harness::Restricted time cpanm Dist::Zilla

# case #2
$ time cpanm -n Dist::Zilla

One thing I realized later (but will describe here) is that cpanminus installs META file information into the archlib path. I was curious how much overhead that added, so I added a third case (also with a clean local library): installing using CPAN.pm with TAP::Harness::Restricted.

To keep that from hanging in the middle of the run, I had to run it enabling default answers to prompts:

# case 3
$ PERL_MM_USE_DEFAULT=1 HARNESS_CLASS=TAP::Harness::Restricted time cpan Dist::Zilla

The results:

  • Case 1: ~16 minutes (cpanminus + TAP::Harness::Restricted)
  • Case 2: ~11 minutes (cpanminus without running tests)
  • Case 3: ~12 minutes (CPAN.pm + TAP::Harness::Restricted)

That was surprising! Comparing #1 and #3, cpanminus writing META files looks like it has about the same overhead as running tests in the first place. If cpanminus didn't do that, then case #2 might drop down to maybe 7 or 8 minutes. That would average around 3 seconds over the 170 dependencies, which seems plausible.

[Update: Miyagawa pointed out that I'm assuming that writing META is the cause of the slowdown and he's right. I suspect that it is a large part of it (it hits disk and executes a separate process), but there might be other reasons as well.]

That was the macro picture. Next I wanted to see how long individual distributions took to install so that I could see which ones were causing the biggest delay.

To profile installation timings, I hacked some timing output into cpanminus and then re-ran case #1. Not surprisingly, a handful of distributions were a huge chunk of the installation time.

The number after the distribution in the list below is the number of exclusive seconds required to download, unpack, configure, build, test and install (cpanminus' writing of META is excluded):

Moose-2.1202: 123
Module-Build-0.4204: 63
Dist-Zilla-5.012: 51
IO-Socket-SSL-1.966: 39
Capture-Tiny-0.23: 39
PPI-1.215: 26
DateTime-TimeZone-1.63: 24
File-Temp-0.2304: 21
DateTime-1.06: 21
Test-Harness-3.30: 16
DateTime-Locale-0.45: 16
MooseX-Role-Parameterized-1.02: 9
Net-SSLeay-1.58: 9
Test-Warn-0.24: 9
libwww-perl-6.05: 9
Test-Simple-1.001002: 7
Config-MVP-2.200006: 7
JSON-2.90: 7
Moose-Autobox-0.15: 6

In some cases, it looks like newer versions of dual-life core distributions are being pulled in when they might not need to be.

For example, Test::File::ShareDir requires a newer Module::Build than ships with Perl v5.18.2 for configuration, but doesn't seem (at first glance) to use any of its features. Switching to ExtUtils::MakeMaker would shave 8% or so off Dist::Zilla's worst-case installation time (assuming tests are run).

Likewise, Tree::DAG_Node requires a very new File::Temp for testing. Is that really necessary? Maybe not.

Of course, these are worst case results. In many real-world cases, you might already have Moose, LWP, DateTime and other modules installed and the installation burden will be less.

So what should you do if you need to install Dist::Zilla?

If you like tests, install TAP::Harness::Restricted and use CPAN.pm like this:

$ cpan TAP::Harness::Restricted
$ PERL_MM_USE_DEFAULT=1 HARNESS_CLASS=TAP::Harness::Restricted cpan Dist::Zilla

If you don't mind installing things without tests, use cpanminus like this:

$ cpanm -n Dist::Zilla

In either case, it's probably going to take about 10 minutes.

Go for a walk, go get a cup of your favorite beverage, take a bathroom break, or whatever. When you get back, Dist::Zilla should be ready for you.


1. If you really can't wait because $job depends on the fix, you can always just patch a tarball from CPAN instead of the repo
2. Despite the complaint that Dist::Zilla requires "half of CPAN", that's actually only about 0.6% of the nearly 30k distributions on CPAN
3. Because capturing output portably can break in so many ways
If you really can't wait because $job depends on the fix, you can always just patch a tarball from CPAN instead of the repo
Despite the complaint that Dist::Zilla requires "half of CPAN", that's actually only about 0.6% of the nearly 30k distributions on CPAN
Because capturing output portably can break in so many ways
This entry was posted in dzil, perl programming, toolchain and tagged , , , . Bookmark the permalink. Both comments and trackbacks are currently closed.

© 2009-2014 David Golden All Rights Reserved