Why you shouldn't waste your time on Perl 5.6

This is a sort-of response to mst's On Perl 5.6 post. Mostly, I agree with Matt about not gratuitously breaking 5.6 compatibility (i.e. to "force people to upgrade").

However, in recent years, I've never received a single bug report or patch from a person actually using 5.6 for anything except smoke testing things on 5.6.

The argument seems to be: "because I think it's important to test things on 5.6, you should make it possible for me to do so". This seems circular.

Testing on 5.6 is pointless if no one is using 5.6 for anything except testing 5.6.

There is an argument sometimes made that if there are users on 5.6, then they should just upgrade. That seems rather obviously wrong, since if there really are such people, there are probably good reasons why they can't. But at the same time, it seems entirely inconsistent for someone to insist that they can't possibly upgrade their perl to something released in the last decade, yet they want the latest and greatest from CPAN.

If there really are people using 5.6 for real, do they really expect CPAN modules to just work? Are they really put out if they don't? Or have they learned to work around it, just like they've worked around the myriad of feature gaps and bug fixes in the last dozen years?

As far as accepting patches for 5.6 support goes, I have very mixed feelings about it. Reviewing patches takes time. Patches accepted might then implicitly indicate support for 5.6 going forward. And if there are bugs, the original patch author might not be around.

So a simple patch that restores 5.6 compatibility might be fine. E.g. "hey, I removed your 'use 5.008' and tested it on 5.6.2 and it passed tests just fine!". But a complex, possibly fragile workaround burdens the receiver if accepted.

If a real person actually using 5.6 asked me to do that, I might accept, depending on how ugly the patch is. But I'm far, far less inclined to do so when sent a gnarly patch by some self-appointed compatibility police telling me 5.6 support is still important "just because".

While it's not for me to tell people what to do with their volunteer time, I suspect that whatever time people are spending writing 5.6 patches and doing 5.6 smoke testing could be far more valuably spent on other things that affect orders of magnitude more real users.

Posted in perl programming | Tagged , | Comments closed

The Annotated Berlin Consensus

The official Berlin Consensus document is on Github. This is an annotated review of it.

The Berlin Consensus

At the first Perl QA Hackathon (QAH) in 2008 in Oslo, a number of QA and toolchain authors, maintainers and experts came together to agree on some common standards and practices. This became known as "The Oslo Consensus".

At the 2013 QAH in Lancaster, a similar brain trust came together to address new issues requiring consensus. This became known as "The Lancaster Consensus".

At the 2015 QAH in Berlin, another group assembled to address new issues, with a particular focus on toolchain governance and recommended standards of care for CPAN authors.

As with other consensus discussions, the speed of implementation of any ideas discussed will depend on the interests and availability of volunteers to do the work.

CPAN standards of care

An ongoing challenge for Perl (and many other languages) is how to balance the benefits of a large repository of open source libraries (i.e. CPAN) against the fragility of large dependency trees. The consensus discussion group agreed on some recommended practices for CPAN authors that will, if widely adopted, improve the general standard of care of CPAN distributions and reduce the fragility of large CPAN dependency trees.

The river analogy

As an analogy to guide recommendations, the group considered CPAN like a "river". Distributions with nothing depending on them are all the way "downriver". Once a distribution gains a dependent, it moves slightly "upriver".

Credit for the river analogy goes to Neil Bowers and Peter Rabbitson. Neil's blog post, "The River of CPAN", expands on the analogy.

Distributions can have direct dependents — things that list it among prerequisites — but direct dependents may have dependents, too, all the way down to distributions with no dependents at all.

By examining this deep dependent tree to find the total number of downriver dependents, we can assess how far "up the river" anything is, which is tantamount to describing how many things it has the potential to break.

As distributions have more and more total dependents in their deep dependent trees, the further upriver they are. All the way upriver lies the toolchain — modules like ExtUtils::MakeMaker, which, if buggy or broken, can break all of CPAN.

Visually, if a module upriver breaks — fails to build, fails tests, breaks compatibility, etc. — it pollutes the river and everything downriver of it suffers.

The actual expression during discusion was "shitting in the river", which makes the point more dramatically (and more effectively).

Therefore, a distribution's position in the river is a guide to the recommended standards that an author should apply to its care.

As a distribution grows in popularity and gains direct dependents, those dependents grow the deep dependent tree; the distribution moves upriver and the standards that responsible authors should apply change accordingly.

Authors, who could have distributions at multiple places along the river, should consider distributions individually, and avoid applying downriver standards to distributions that have moved upriver.

Some authors treat CPAN like a playground sandbox. Other authors work mostly on distributions with lots of dependencies. And some have both sorts plus stuff in the middle. It's important to keep in mind that the recommendations that follow don't apply to authors. They apply to distributions and authors can and should apply different standards to different distributions.

The group also discussed how we never know how much of DarkPAN code depends on a given distribution, making any CPAN-focused analysis of the river only an approximation of the true number of downriver dependents.

Neil was on a roll. Another blog post of his describes an idea for letting DarkPAN register their dependencies.

Recommended practices for CPAN authors

For the sake of discussion, the group arbitrarily divided the river into "way downriver", "way upriver" and "in the middle", and considered recommended practices for each.

While it wasn't discussed at the time, subsequent analysis found:

  • ~50 distributions with 10,000 or more total dependents
  • ~200 distributions with 1000 to 9,999 total dependents
  • ~2,000 distributions with 10 to 999 total dependents
  • ~8,000 distributions with 1 to 9 total dependents
  • ~16,000 distributions with no known dependents

Neil, again, offers a blog post with CPAN River statistics. I also made a log-log plot showing what percent of CPAN has how many downstream dependents.

Again, while it wasn't discussed, if one needed guidance about whether a distribution is upriver or downriver or in the middle, one might consider the three groups roughly like this:

  • "way downriver" — zero to low tens of total dependents
  • "in the middle" — high tens to low thousands of total dependents
  • "way upriver" — high thousands to tens of thousands of total dependents

Defined like this, "way downriver" is about 95% of CPAN. "Way upriver" is less than half a percent. "In the middle" is what's left.

The recommendations that follow for these groups are aspirational. Not every distribution can or will do everything here, but the more it can, the better off users of that distribution will be.

Practices for distributions "way downriver"

Any author's first upload is, by definition, way downriver. Brand new CPAN authors should read a good "how to" for CPAN authorship, such as the "About PAUSE" pages and "perldoc perlnewmod".

Downriver distributions should be "well-formed", following many of the basic "Kwalitee" rules described on CPANTS. They should have documentation (ideally spell-checked), a "t/" directory with tests (that are run before shipping, e.g. with "make disttest"), and a clearly stated license.

Distributions should respect CPAN namespace conventions.

Distributions should include a "Changes" file that highlights key differences between releases. They should have a META.json file that follows the CPAN::Meta::Spec and a corresponding META.yml file for older perls.

The author should decide on an "issue tracker", whether the default rt.cpan.org or an issue tracker combined with a source code repository and include the issue tracker URL in the META.json file. Distribution documentation should include contact information if it differs from the author's cpan.org email address.

If a distribution duplicates features of existing modules, the documentation should describe why it was created and how it differs from other, similar modules (e.g. a "SEE ALSO" section).

There are also some practices that distributions should avoid.

When configuring, building, testing or installing: don't attempt to "phone home" over the Internet; don't modify the filesystem outside the distribution directory or the system temporary directory; don't send emails.

Don't hijack other modules by installing a .pm file that overwrites or otherwise shadows a module that ships from another distribution.

Don't run "author tests" (e.g. pod formatting, coverage, spelling, etc.) on end-user systems.

Don't be malicious.

Don't be rude.

One example that came up in discussions was a Makefile.PL that made snide comments only on Windows machines. Not cool.

Practices for distributions "in the middle"

Distributions "in the middle" should follow all the recommendations of those "way downriver", plus additional recommendations.

Distributions should plan for API stability. Breaking changes should be made as rarely as possible and should occur after a period of deprecation. Including a statement about a stability policy in documentation is recommended as well, to help end-users know what to expect.

It's perfectly fine for a distribution to plan for instability. Some distributions have high velocity and make backward breaking changes regularly. That's absolutely OK if clearly documented so users know what to expect.

Distributions should aim to be portable across "mainstream" operating systems, whenever possible. They should attempt to support older Perls (e.g. 5.8 or 5.10) and should, regardless, have an explicit minimum perl version in their prerequisite metadata. Portability commitments should also be included in documentation.

We didn't get into what "mainstream" means, so it's sort of in the eye of the beholder. Personally, I think that's at least Linux, Windows and OS X. Maybe at least some BSD flavors, too. For seeing what Perl versions you support, check out Perl::MinimumVersion and Perl::MinimumVersion::Fast

Distributions should have a public source code repository (listed in the META files), including contribution instructions. Repositories should be connected to some sort of continuous integration service for early identification of commits that cause tests to fail.

Distributions hosted on Github should look at Travis for this.

Distributions should be licensed under the terms of Perl itself or else a compatible OSI-approved license. E.g. a GPL-only (or other "viral" license) may limit how far upriver a distribution can go. Note, also, that a "public domain" dedication is not always legally valid and is not an OSI-approved license.

Distribution authors should pay attention to the issue tracker and at least acknowledge bug reports in a timely fashion.

Distributions should have a co-maintainer registered on PAUSE or other documented succession plan in case something happens to the original author.

More on this below in the section about PAUSE adoption policies.

Distributions should aim for high quality releases: they should have good test coverage; authors should use author-only tests of distribution quality before shipping (e.g. in xt); authors should monitor CPAN Testers results to identify and fix broken releases.

Before releasing a stable, indexed version to CPAN, authors should release a non-indexed, developer release to CPAN and monitor the CPAN Testers Matrix results to ensure that nothing is broken across major operating systems or versions of Perl. (N.B. typically 36 to 48 hours is sufficient for good coverage.)

Distribution authors should be mindful of a distribution's place in the "CPAN river". They should pay attention to the stability and quality of upriver dependencies and should consider the stability and quality of a distribution before adding it as a new dependency. They should monitor the total number of downriver dependents to reassess the standards of care to apply to the distribution.

In particular, be very thoughtful about adding a dependency that is downriver from your distribution, because any new dependency moves upriver (by definition). If the author of that dependency has a worse standard of care, your distribution — and everything that depends on it — becomes more fragile. The farther upriver your distribution is, the more you should consider discussing standards of care with the author of a potential dependency before relying on it.

For distributions in the upper-middle parts of the CPAN river (e.g. 1000+ total downriver dependents), authors should consider regular testing of some or these downriver dependents against new versions of the distribution before shipping a stable release.

Practices for distributions "way upriver"

Distributions "way upriver", should follow all the recommendations of those "in the middle" and "way downriver", plus additional recommendations.

This section is the most aspirational. Many toolchain modules don't fully live up to these practices. Many are technically hard. But everyone present agreed that toolchain authors should aim to do these things, even if it takes a while before it's regular practice.

Distributions should undergo some sort of code review before a stable release, so that there is more than one set of eyes responsible for the code.

Distributions should have some documented public forum for discussing the distribution and its evolution. Proposed major changes should be discussed before implementation.

Distributions should design for forward-compatibility, when possible, so that old code can handle inputs designed for later versions of the distribution and either adapt or throw an informative error, rather than fail strangely.

Distributions that use C or XS should aim for C89 compatibility, the same as Perl itself.

Distributions should do performance testing before releasing major changes as stable.

Distributions should be tested against bleadperl, on as many platforms as they can.

Authors should release a non-indexed development version of distributions before an indexed, stable version for any non-trivial, non-emergency change.

Authors should regularly check the full CPAN testers matrix for distributions to find failure patterns that affect particular platforms or versions of Perl.

Authors should test at least a representative portion of direct (or even total) downriver dependents against new versions of a distribution before shipping a stable release.

Stable releases of distributions should not be shipped before bedtime or the weekend or any other time when the author is unavailable to fix or revert an unexpectedly broken release. Authors should watch closely for indications of breakage after release.

Responsible forking

While we hope that authors follow standards of care consistent with a distribution's place in the CPAN river, there will always be times when some distribution dependency isn't meeting one's expectations. In such a case, a distribution always has the option to fork the dependency or to replace it with an alternative.

Because forks can be controversial, we discussed practices for responsible forking.

We talked quite a bit about how there needs to be some recourse when the author of some dependency doesn't exercise the standard of care you want, doesn't change in response to feedback and isn't willing to hand over maintenance to a more responsible maintainer. The group wanted to lessen the stigma of forking — not for fundamental differences or outright neglect, but just to achieve a different stability goal in the dependency tree.

When this section talks about "forks", we intend that term to cover literal forks (same code base to start, and same API), rewrites (new code base, but same API) and alternative replacements (new code, with new API).

Before forking

Before considering a fork, authors should talk to the current maintainer(s) about their concerns, ideally in a public forum so that concerns and responses are on the record.

Authors should also consult interested parties, such as other authors with similar concerns (e.g. authors of other distributions who also depend on the distribution to be forked, as well as direct downriver dependents who might be affected by the change in their own upriver dependency tree.

Changing a distribution's dependencies is a deep-dependency change for every dependent. This can be a big impact for people who manage full dependency trees via Carton or Pinto or who have to do compliance audits of all new dependencies. Sometimes it's necessary, but it should never be done lightly.

When deciding to fork

If the original maintainer isn't able to satisfy concerns and the decision to fork is taken, authors should notify the original maintainer(s), again ideally in a public forum so that the history is available for future users to review.

The author should also notify interested parties that were consulted, so that they can participate in the process, if desired.

To help future potential users understand the differences between the modules, the author should document the rationale for forking (focusing on factual rather than emotional issues or personal attacks).

Authors should try not to "burn bridges". On seeing a fork under development, the original maintainer may reconsider his or her original stance, or offer to hand over maintenance or otherwise find some mutually satisfactory resolution that doesn't require the fork to be published to CPAN.

After a fork is published

When the fork is shipped, the author should make a general announcement to the same forums used for earlier discussions.

The author should carefully consider impacts on the CPAN river when migrating. E.g. adding a newly published fork as the dependency of a mid-stream distribution catapults the fork upriver, which adds it as a dependency to all downriver distributions and puts them all at risk if the fork introduces new bugs or other surprising behavior.

New code has no track record. Writing a fork and putting it upriver immediately is risky. It might make sense to release a fork independently for a while and let it season and get feedback before using it to replace a fragile dependency.

When considering urging other distributions to adopt a fork in place of the original, authors should use a "smart bomb" approach rather than a "carpet bomb" approach. Rather than making an appeal to all direct dependents of the original, authors should target the appeal. Examples of respectful targeting include (but are not limited to):

  • upriver dependencies that also depend on the original distribution (e.g. as part of an attempt getting the original out of the full upriver dependency tree)
  • other, unrelated distributions that are reasonably believed to be affected by the same issues that prompted the fork

In other words, we don't want a mass call saying "hey, module X is bad, use module Y instead". We'd rather have a specific call, "hey, I'm trying to get X out of my dependency tree because Z, would you consider switching to Y also?" or "hey, you're using module X to do Z, which has this particular problem PRQ; I've switched to module Y and you might consider that too."

Toolchain governance

These discussions actually preceded the CPAN River discussions, but it made more sense to write them up the other way. In spots, these points have been harmonized with the points above.

The toolchain, in general terms, refers to those modules that are required to download, build, test and install the vast majority of distributions on CPAN.

Modules like this generally ship with the Perl core (e.g. ExtUtils::MakeMaker) but frequently have a "dual life" independently on CPAN. By definition, these toolchain distributions are "way upriver" on the CPAN river.

Other modules may not ship in the Perl core, but may be popular solutions for specific toolchain-like tasks. Such modules are in the "toolchain" for distributions that rely on them.

The consensus attendees represented a sizeable cross-section of core and popular non-core toolchain maintainers. They agreed that toolchain development and release practices could be improved.

They agreed to "sign on" to a "Toolchain Charter" to govern the ongoing development of toolchain distributions. While this discussion actually pre-dated the CPAN river discussion, the points are largely consistent.

A frequent side topic was whether this could be enforced in any way, and the group — wisely in my view — stuck with the idea that it's voluntary and that if enough people lead by example, then others may follow suit.

Toolchain charter practices

Toolchain authors present agreed to the following principles and practices to ensure good governance and responsible administration.

  1. The toolchain has a wider scope of backwards compatibility goals than the Perl core, as toolchain aims to support every Perl from 5.8.1 onwards.
  2. Toolchain distributions should have more than one "primary" maintainer (regardless of actual PAUSE permissions) and a list should be published showing distributions and maintainers.
  3. Functional changes to toolchain distributions need more than one set of eyes approving changes before shipping a stable release to CPAN
  4. Major or breaking changes to a toolchain distribution should be discussed in a public, archived venue. The cpan-workers list was chosen as the initial venue.
  5. If discussions about the evolution of toolchain distributions fail to achieve consensus, toolchain authors agree to defer to a designated "tie-breaker" authority. The Perl pumpking (regardless of who that may be at any point in time) was the initial choice for tie-breaker.
  6. No stable distribution should ship until some degree of stability is verified (e.g. though a combination of smoke reports, dependency smokes, etc.); the choice of specific mechanisms were left to the discretion of maintainers. The group called out one exception: emergency fixes to a broken stable release should proceed without delay.
  7. Toolchain authors agreed that when a primary maintainer steps down or becomes permanently unavailable, the toolchain authors as a group will jointly agree on a successor. PAUSE administrators should defer to the consensus (or decision of the tie-breaker) for handing over PAUSE permissions as needed. Any successor should agree to the practices described herein.

Point #5 was sort of a default position. There was no consensus that it had to be the pumpking forever, but given that so many of the toolchain modules are dual-life, the pumpking was an unobjectionable choice to start with.

We didn't get into exact mechanics of #7. My thought is that it works in two parts: (1) an existing primary maintainer will consult the broader group for "advice and consent" in choosing a successor; (2) if the primary maintainer is unavailable for a long time, PAUSE admins will consult with the broader group before transferring permissions.

Toolchain authors not present are strongly encouraged to agree to these practices or to hand over their toolchain maintenance responsibilities to others who are willing to do so.

Model behavior we want to see in other "way upriver" distributions

The group briefly discussed whether or how to urge other important, non-toolchain, but widely depended-on distributions to sign on to the Toolchain Charter or an equivalent.

The consensus was that the Toolchain Charter should serve as a model for the kind of behavior we'd like to see broadly in other widely used distributions, but that public role-modeling and public promotion was preferred over targeted appeals or peer pressure on non-toolchain authors.

Rather that go out and try to get various upriver distributions to "sign on" to the Toolchain charter, the Toolchain charter should just be an example that other authors/groups with similar goals might adapt for their own purposes.

META spec

The group discussed various potential changes to the CPAN META Spec.

No need for META 3 yet

There was consensus that developing a "v3" META spec was not yet needed. There was no burning platform for it, and some forward compatibility risks to address first. The consensus was that continuing to develop x_ fields was sufficient to address emerging needs.

prereqs keys to ignore

The 'prereqs' data structure can contain fields that can't be resolved as actual prerequisites. Examples include "perl", "Config" and "Errno". The group agreed that a list of known keys should be included in the implementation notes section of the META spec.

Specifically with regards to 'perl', we agreed that CPAN clients must never try to install 'perl'. If it is specified as 'requires', installation must abort. If specified for 'recommends' or 'suggests', CPAN clients may warn about it, but must otherwise ignore it.

Specifying a "recommended" version of Perl seems weird to me, but apparently some people do it meaning "I support older Perls, but this really works best on version 5.x.y". CPAN client support for communicating that isn't great and needs to be improved.

Further clarifying 'recommends' and 'suggests'

The group agreed that these prerequisite relationships were still being misunderstood and that the spec should be clear that "recommends" is optional and installed by default, whereas "suggests" is optional and not installed by default.

CPAN client interpretation of 'x_breaks'

The group agreed that x_breaks needs to be implemented in CPAN clients.

This is the replacement for conflicts which sort of means "don't install me if this other module is present", whereas what most people seem to want is "I break this other module, so after upgrading me, you need to upgrade it, too."

x_breaks is a top-level key in META. It is a hash with module name as keys and a version range that indicates the broken range, usually using as a <= relation. E.g. in YAML format:

x_breaks:
    Foo::Bar: <= 1.23

When x_breaks is found in META, after successful installation, CPAN clients should check if any of the modules listed in x_breaks are installed and if their $VERSION matches the version range. If so, and if the latest version on CPAN does not match the x_breaks version range, then the CPAN client should install the latest version of that module from CPAN

CPAN clients may warn about unsatisfiable x_breaks before or after installing a module, may prompt before installing x_breaks, etc., depending on their individual approaches to configuration, prompting and warning.

Need for a post-install recommendations key

There are circumstances where two distributions can each operate without the other but both benefit from the other's presence. In this case, each specifying the other as 'recommended' causes a circular dependency, albeit an optional one, which can confuse CPAN clients and dependency analysis.

To avoid this, the group agreed there needs to be a new META key that requests that CPAN clients install a distribution after installation is complete — not a prerequisite relationship, but an "also install" relationship.

We agreed that Karen Etheridge volunteered to prototype such a key and discuss it with CPAN client authors for feedback and potential implementation.

Signaling pure-Perl

In the Lancaster Consensus, the group agreed on command-line arguments for Makefile.PL and Build.PL to signal that a pure-Perl build is desired. In practice, the problem with this approach is that distribution authors must each individually check for such flags and take them into account.

The group agreed that since there are relatively few compiler-detectors (e.g. ExtUtils::CBuilder), it would be better to have a common environment variable that signals to such compiler detectors to report that no compiler is available. This would then have an effect on all distributions using that compiler detector.

Peter Rabbitson volunteered to pick a name for the variable and Leon Timmermans agreed to implement it in ExtUtils::CBuilder.

CPAN Testers grading

The historical policy of CPAN Testers was to not send failing test reports when prerequisites could not be satisfied. The group agreed that getting such reports – under some other category than "FAIL" — would make it easier to detect when upriver dependencies were broken.

The idea is that if your tests fail because some upriver dependency can't be installed, then your module can't be automatically installed. It might not be your fault, but everything downriver of you is screwed until you work around the problem or get the upriver dependency fixed.

Barbie and David Golden volunteered to come up with a proposal for review on the cpan-workers list.

PAUSE

The group considered potential changes to PAUSE administration and policies.

Changing adoption policies

Historically, the process for adopting an abandoned distribution allowed the first person to petition PAUSE admins for a takeover to receive permissions. In light of concerns about the stability of distributions with lots of downriver dependencies, the group thought a policy of "first warm body" adoption was no longer a wise idea.

The group considered the opposite approach — that PAUSE admins no longer transfer permissions at all; that if the primary maintainer were gone, that a distribution was effectively end-of-life. The group agreed that this would do more harm than good.

The group agreed that for distributions with no known dependencies, the "first warm body" rule was acceptable. For distributions with dependencies, PAUSE administrators will consider the number of downriver dependencies and exercise their judgment and consider the track record and involvement of a petitioner in the distribution in question. E.g. they will consider existing co-maintainer status, commit involvement, etc. Without obvious involvement of this sort, a petitioner will need to present some evidence of support from interested parties (e.g. direct downriver dependents).

PAUSE administrators may elect to deliberate on their decision in private, but will announce a decision and rationale publicly.

Distributions that want to avoid succession problems are encouraged to add co-maintainers as designated successors.

Rewriting PAUSE not needed

The group considered briefly whether PAUSE needed to be rewritten from scratch. The consensus was to proceed with efforts to Plack-ify it and use that as a basis for additional development.

Kenichi Ishigaki (charsbar) pretty much made this discussion moot. Read his blog post about PAUSE on Plack for details.

Encouraging good licensing

The group considered whether PAUSE should require a detectable license for indexing; instead the group decided that authors should be encouraged, but not required, to have one. E.g. the indexer email could notify authors that no license was found and encourage them to add one.

Responding to take-down requests

We discussed whether PAUSE should have a formal policy about take-down requests. The consensus was that it was unnecessary and that instead, PAUSE admins will continue to exercise judgment.

However, we agreed that PAUSE (and all other Perl ecosystem sites) should have clearly indicated contact information for take-down notices.

Deleting tarballs faster

Currently, deleting a tarball on PAUSE schedules it for deletion in three days. We agreed that faster – nearly immediate – deletion is desirable to get broken distributions off CPAN quickly.

However, we agreed there should be a slight delay to give backpan mirrors a chance to catch up to store the distribution for the historical record.

It turns out that by the time anyone could actually schedule an immediate deletion, replication to BackPAN and other mirrors will have already happened. PAUSE/CPAN is that fast these days.

Any such feature requires some sort of confirmation to help users avoid deleting things unintentionally. Issue #163 has been opened for this feature request.

Requiring a meta file to be indexed

The group discussed whether PAUSE should require a valid META file for a distribution to be indexed. When META generators start universally providing the 'provides' field, this will get PAUSE out of the business of guessing which packages are provided by a distribution, so encouraging including META now will lay the groundwork for that future change.

If implemented, distributions would be required to have either META.json or META.yml to be indexed.

The group thought this seemed reasonable, but there was some concern about how many distributions in the recent past would have had problems if this the policy had been in force. Ricardo Signes volunteered to research how many distributions uploaded recently would have had problems and review with Andreas Koenig before implementing.

Test-Simple roadmap

As part of the consensus discussions, the group considered the roadmap for Test-Simple which provides the Test::Builder library upon which Test::More and most test libraries are written.

This discussion actually led off the QAH in order to give Chad Granum, the current Test-Simple maintainer, some direction for the hackathon. I wonder if it would have gone differently if it had happened after the CPAN River discussion.

Problems in Test::Builder

The group identified several fundamental limitations of Test::Builder:

  • Async/multi-process support is either non-existent or requires complex
    and fragile work-arounds.

  • The lack of extension points has led many test libraries to invade the private internals of the Test::Builder singleton, or to monkey-patch its methods. These hacks are inherently fragile and the more the test ecosystem proliferates such things, the more likelihood that mixing arbitrary test libraries will break in unexpected ways.

  • Testing test libraries requires either parsing TAP or checking against specific strings. This is hard, leaving many test libraries poorly tested, if at all. When there are tests, the tests are fragile. Overall, this limits the ability of test library authors to evolve TAP itself.
  • More generally, having the test library so tightly coupled to TAP means that the capabilities of the library are limited to what TAP supports and there's no easy way to use the standard testing library, but output results in a different form (e.g. xUnit-style).
  • The current Test::Builder is heavy and slow, holding too much state and doing to much repetitive work.

Cost, benefits and risks

The group judged the benefit of addressing the problems above to significant enough to justify the idea of a rewrite of the internals of Test::Builder.

With regard to the proposed Test::Stream-based replacement, the group agreed that the general design could address some of the problems identified, but, in light of the risks, put forth a "punchlist" of specific tasks to be completed before considering the specific implementation of that design ready to move forward:

  • A single Test-Simple branch with proposed code and a corresponding Test-Simple dev release to CPAN
  • Single document describing all known issues
  • Invite people to install latest dev in their daily perls for feedback
    • Write document explaining how to do so and how to roll-back
  • Update CPAN delta smokes: compare test results for all of CPAN with latest Test-Simple stable and latest
    • On Perl versions 5.8.1 and 5.20.2; both with and without threads
    • Finding no new changes from previous list of incompatible modules doing unsupported things
  • Line-by-line review of $Test::Builder::Level back compatibility support
  • bleadperl delta smoke with verbose harness output with latest stable and latest dev; review line-by-line diff and find no substantive changes (outside Test-Simple tests themselves)
  • Performance benchmarking — while specific workloads will vary, generally a ~15% slowdown on a "typical" workload is acceptable if it delivers the other desired benefits.
    • Add patches to existing benchmarking tools in Test-Simple repo
    • Run benchmarks on at least Linux and Windows

The group agreed that additional items could be added through the toolchain governance mechanism.

The discussion of Test-Simple was explicitly divided into three parts: (1) is Test::Builder so flawed that a rewrite is worthwhile? (2) Will the proposed design address #1? (3) Is the specific "release-candidate branch" implementing #2 good enough to go forward? The consensus answers were more-or-less "yes", "probably" and "not yet".

Participants in the Berlin Consensus discussions

Discussions lasted over 4 days, participants came and went, but each day had about 20-25 people. Thank you to the following participants:

Andreas König, Aristotle Pagaltzis, Barbie, Bulk88, Chad Granum, David Golden, H. Merijn Brand, Helmut Wollmersdorfer, Herbert Breunung, Ingy döt Net, Karen Etheridge, Kenichi Ishigaki, Leon Timmermans, Matthew Horsfall, Neil Bowers, Olivier Mengué, Paul Johnson, Peter Rabbitson, Philippe Bruhat, Ricardo Signes, Salve J. Nilsen, Slaven Rezic, Stefan Seifert, Tatsuhiko Miyagawa, Tina Müller, and Wendy van Dijk

(Apologies to anyone present who was left off the list. Email dagolden at cpan dot org or send a pull request to be added.)

Posted in cpan, perl programming, toolchain | Tagged , , , , , , | Comments closed

Faster ordered hashes for Perl

With some prompting and suggestions from Mario Roy, author of MCE, I've been optimizing Hash::Ordered. With the exception of setting existing/new elements, which got a bit slower due to ensuring keys are strings, not references, most functions got faster. Some, like large hash deletion, are now MUCH faster.

Here are changes in benchmarks from version 0.002 to 0.009:

        $VERSION      0.002      0.009

     Results for ordered hash creation for 10 elements

                    94121/s   101293/s

     Results for ordered hash creation for 100 elements

                    10931/s    11226/s

     Results for ordered hash creation for 1000 elements

                     1022/s     1160/s

     Results for fetching ~10% of 10 elements

                  1417712/s  1844781/s

     Results for fetching ~10% of 100 elements

                   244800/s   285983/s

     Results for fetching ~10% of 1000 elements

                    24871/s    30342/s

     Results for replacing ~10% of 10 elements

                  1353795/s  1378880/s

     Results for replacing ~10% of 100 elements

                   197232/s   192769/s

     Results for replacing ~10% of 1000 elements

                    20364/s    19909/s

     Results for adding 10 elements to empty hash

                   367588/s   341022/s

     Results for adding 100 elements to empty hash

                    66495/s    58519/s

     Results for adding 1000 elements to empty hash

                     7217/s     6497/s

     Results for creating 10 element hash then deleting ~10%

                    95284/s   94598/s

     Results for creating 100 element hash then deleting ~10%

                     6924/s    9242/s

     Results for creating 1000 element hash then deleting ~10%

                      144/s     934/s

     Results for listing pairs of 10 element hash

                   170187/s   178288/s

     Results for listing pairs of 100 element hash

                    18839/s    19537/s

     Results for listing pairs of 1000 element hash

                     1877/s     1959/s

You can see Hash::Ordered benchmarked against other modules in Hash::Ordered::Benchmarks.

If you need an ordered hash, I encourage you to try Hash::Ordered.

Posted in perl programming | Tagged , , , | Comments closed

Perl QA Hackathon 2015 report

tl;dr: I led hours of "consensus discussions" about toolchain governance, the Test::Builder roadmap, PAUSE policies and responsible authoring practices. I fixed bugs and applied patches for CPAN.pm, CPAN indexing and CPAN META tools. I experimented with indexing META files to generate deep reverse-dependency graphs. I concluded I need to invent Metabase 3.0.

Why I love the Perl QA Hackathon

I noticed my last several write-ups started with a "why I love..." section. I thought about doing something different, but realized that it really is the most important thing I want to say each year.

The Perl QA Hackathon is the one time during the year when I get to set aside concentrated time to scratch my itches about the Perl ecosystem – things that have been bugging all year that I just haven't had a chance to work on.

Most of the attendees are people I only talk to online the rest of the year. I love getting to see people face to face, share a meal, tell some jokes, and connect as people and not just as fellow programmers.

This isn't the kind of hackathon where caffeine-abusing participants race to churn out hip, shiny, disposable apps to wow a crowd or a future employer. This is more like Scotty and his team getting that long-overdue layover at a starbase to give the weary Enterprise an overhaul from the inside-out.

Mr. Scott

It's dirty, gory work… and a lot of fun!

How the Perl community benefits

It's often said that CPAN is Perl's killer feature. But for anyone who's worked with Perl for a while, it's obvious that CPAN comes with a lot of sharp edges.

The Perl QA Hackathon brings together some of the top people working on smoothing out those sharp edges and gives them some dedicated time to think about, discuss, implement and deploy solutions. Everyone benefits!

The hackathon is broad, with proposed projects covering dependency hell, repeatable deployments, testing libraries, code quality analysis, build systems, continuous integration tools and more. We fix security holes, chase down edge conditions and stubborn bugs, debate new standards, and review long dormant pull requests.

Day -1

On Tuesday, I arrived at Berlin's Tegel airport at what my body clock insisted was 2AM after a grand total of 2 hours of sleep. This was probably my personal low point of the hackathon; I hate red-eye flights. Also on my flight were Ricardo Signes, Tatsuhiko Miyagawa, and Matt and Dom Horsfall.

IMG_20150414_030233977

Fortunately, we arrived to a beautiful, warm spring day! As we were too early to check into the hotel, Ricardo, Matt, Dom and I dropped our bags and went out wandering to explore Berlin.

IMG_20150414_050322956_HDR IMG_20150414_053958138

Lunch that day was bratwurst, my first of the trip, which I might have enjoyed more if I weren't feeling hung-over from the flight.

After getting into the hotel, showering and napping, I met everyone for dinner at a local pub that became the "go to" place across the street from the hackathon.

Day 0

Wednesday had the best weather of the conference – almost early summer weather – and several of us took full advantage to hit some typical tourist spots within walking distance of the hotel.

IMG_20150415_043441072 IMG_20150415_052247976 IMG_20150415_061436863

After lunch and another nap I worked a bit on planning the agenda for the various "consensus" discussions I was moderating for the next few days. At that point, my red-eye hangover had faded (though I wouldn't sleep well the whole trip) and I felt ready to get started.

That evening was the welcome dinner for all the participants. I wound up at a table with Peter Rabbitson, and we "amused" ourselves with a heated debate over what it meant to be a responsible maintainer – a theme to which we would return repeatedly, and more peaceably, for the next many days.

Day 1

Thursday was our first day in the Betahaus space.

IMG_20150417_151911627_HDR

I kicked off the day moderating the first 2-ish hour large-group discussion, this one on the Test-Simple roadmap.

Test-Simple roadmap

I wanted to start with the Test-Simple roadmap discussion to give Chad Granum (the current maintainer) some direction for the rest of his time at the hackathon. As the reason for this talk may be a bit opaque to my readers, I'll step back and provide some context:

Test-Simple is the name of the standard testing library distribution. In addition to the familiar Test::More module, it has Test::Builder, which is the underlying framework that powers most of the test libraries on CPAN.

Over the last year, Chad has been working on a major internal refactoring of Test::Builder. Last October, an alpha was merged to the Perl blead branch for testing, but, due to ongoing concerns, was reverted out of blead in March.

One of my goals for the hackathon was to get broad consensus on a roadmap for it: would it eventually go forward as a new Test::Builder or should it be released as a "Test::Builder2"? If it were to go forward, what had to be done to assuage critics that it was ready?

In the discussion, I asked people to step back from the implementation and consider Test::Builder itself. Were the problems in it sufficient to justify reworking its guts? The group agreed on several unavoidable problems with the current design:

  • Async/multi-process support is either non-existent or requires complex and fragile work-arounds.
  • The lack of extension points has led many test libraries to invade the private internals of the Test::Builder singleton, or to monkey-patch its methods. These hacks are inherently fragile and the more the test ecosystem proliferates such things, the more likelihood that mixing arbitrary test libraries will break in unexpected ways.
  • Testing test libraries requires either parsing TAP or checking against specific strings. This is hard, leaving many test libraries poorly tested, if at all. When there are tests, the tests are fragile. Overall, this limits the ability of test library authors to evolve TAP itself.
  • More generally, having the test library so tightly coupled to TAP means that the capabilities of the library are limited to what TAP supports and there's no easy way to use the standard testing library, but output results in a different form (e.g. xUnit-style).

In addition, the current Test::Builder was judged to be heavy and slow, holding too much state and doing to much repetitive work.

With broad consensus to move forward with a rewrite to address these concerns, we had Chad walk us through the architectural design of his alpha releases. After a bit of discussion about why concerns were separated the way they were, the group agreed the design was reasonable and turned to how to evaluate the specifics of the implementation.

After yet more debate about how best to assess readiness, we converged on a "punch list" of items to be completed before we considered the new implementation ready to be released as the stable Test::Builder:

  1. A single Test-Simple branch with proposed code and a corresponding Test-Simple dev release to CPAN (Chad)
  2. Single document describing all known issues (Chad to write; Andreas to review)
  3. Invite people to install latest dev in their daily perls for feedback
    • Write document explaining how to do so and how to roll-back (Chad and Ricardo)
  4. Update CPAN delta smokes: compare test results for all of CPAN with latest Test-Simple stable and latest dev (Andreas, Leon, David)
    • On Perl versions 5.8.1 and 5.20.2; both with and without threads
    • Finding no new changes from previous list of incompatible modules doing unsupported things
  5. Line-by-line review of $Test::Builder::Level back compatibility support (Peter to review; David to vet results)
  6. bleadperl delta smoke with verbose harness output with latest stable and latest dev; review line-by-line diff (Chad and Karen)
    • Should be no substantive changes (outside Test-Simple tests themselves)
  7. Performance benchmarking — while specific workloads will vary, generally a ~15% slowdown on a "typical" workload is acceptable if it delivers the other desired benefits.
    • Add patches to existing benchmarking tools in Test-Simple repo (Karen and bulk88)
    • Run benchmarks on at least Linux (Chad) and Windows (volunteer needed)

I was very pleased that we got convergence on a roadmap. While not everyone that participated was comfortable with the idea of moving the redesign forward, no one had any other concrete ideas for a punch list.

PAUSE and CPAN clients

I spent most of the rest of my day on PAUSE and CPAN.pm related issues.

Andreas merged and made live my long-awaited pull request to improve the detail available in email reports when the PAUSE indexer fails to index a distribution. I immediately whipped up a distribution designed to fail and sent it through to test the change.

2015-04-24 at 1.24 PM

I then spent some time diagnosing problems with CPAN.pm bootstrapping local::lib on perls from 5.14 to 5.18. I finally tracked it down to some internal changes in local::lib and filed a hack-ish pull-request that would restore support in legacy CPAN.pm clients.

Meanwhile, Miyagawa had fired off several pull-requests to enhance CPAN::Common::Index and then wanted to discuss more invasive changes to support a MetaCPAN backend for it.

Aftermath

We had another group dinner that night, but I was pretty wiped and headed back early to the hotel for a little more quiet hacking in the hotel.

Before crashing, I dusting off various CPAN Testers libraries that I hadn't looked at in years, updating a couple of them to my new Dist::Zilla-based build setup.

Day 2

IMG_20150417_151924819_HDR IMG_20150418_113650162

On Friday, I largely split my time between analytics on META files and discussion moderation.

Indexing CPAN Meta

Since I had volunteered to help regression test Test::Builder, I decided to experiment with indexing META files to find dependent distributions. I keep a minicpan on my hard drive, so I used CPAN::Visitor to unpack all the META files and stick them into a local MongoDB database for analysis.

I had been thinking about doing that anyway as a demonstration for my MongoDB and Perl talk at YAPC::NA, so this gave me a chance to try it out and see if it was feasible. My initial test run was about 40 minutes — cutting it too close for a talk — though I kept iterating over the course of the hackathon and got it substantially faster by the time I got home.

To develop the candidates for regression testing, I decided to rule out most modules using Test::More in the way it was intended — that API wasn't changing. Instead, I decided to look at three different sets of distributions:

  • Distributions with either $Test::Builder::Level or Test::Builder-> method calls in the codebase. I found these via http://grep.cpan.me/, using the App::cpangrep tool.
  • Distributions with modules that start with "Test::". While not all use Test::Builder, most do.
  • Distributions that depend on the "Test::" modules above. In case those modules don't have good tests, testing their dependents should exercise them well.

The result, after various munging and de-duplicating, came out to just over 6000 distributions to regression test. I'd done similar tests before with Module::Build, so I dug up my old regression test tools to see if they might still be useful.

Toolchain governance and PAUSE adoption policies

In the middle of the day, I took a break to lead another 2-hour group discussion. I'll be writing these up in much greater detail separately, so I'll only mention the topics in general:

  • Would toolchain authors agree to a "toolchain charter" governing how modules can be managed more as a group and less as a loose collection of individuals?
  • What principles and practices would the group agree to?
  • How should toolchain modules get handed off over time when maintainers disappear or need to take a break?

We then had a related discussion about how PAUSE module hand-overs should work. Currently, the PAUSE adoption process hands over a module to any interested party after the primary maintainer has been non-responsive for a period of time. While that seemed fine for a rarely used module, on reflection it didn't seem right as modules have more and more dependents.

I'll say more about that in the discussion write-up I'll post separately, but in general, the consensus view was that PAUSE admins need to exercise more judgment in adoption approvals and that for a widely used module, an prospective adopter needs to make a case for stewardship.

Day 3

IMG_20150418_095530700

On Saturday, I focused my morning on coding and scheduled that day's consensus discussions in the late afternoon so we'd have to end before our social outing to Berlin's computer game museum.

Hacking on CPAN Meta modules

I applied a bunch of pull requests to CPAN::Meta and CPAN::Meta::Requirements, some from the hackathon and some from before. One of the most significant was Karen Etheridge's patch to CPAN::Meta::Merge to allow deep merging of non-conflicting keys.

I shipped dev releases of both to CPAN so they could get smoked by CPAN Testers before I release them as stable. This was a practice that I've committed to adopting as part of the toolchain governance discussions. Even if I don't think there are any changes that will break on a platform, I'd rather be safe and test a dev release than ship to CPAN and break it.

Improved CPAN.pm warnings for missing make

While playing with some virtual machines generously donated by DreamHost for our regression testing, I found myself struggling to bootstrap local::lib with CPAN.pm. I initially thought it was the same problem I'd had on Tuesday, but after flailing a while, I realized my mistake.

Always install the 'build-essential' (or equivalent) package on a new machine!

I've made that mistake before and just forgot about it. Unfortunately, CPAN.pm doesn't tell you that make isn't installed; it just tells you it failed.

So I quickly sent a pull request to Andreas to fix that and it will be in the next release of CPAN.pm.

Revived Metabase::Web

Metabase::Web is the code that runs the service that receives CPAN Testers reports. I haven't touched it in years and spent some time getting familiar again with how to configure and launch a Catalyst app.

One of my near-to-medium term goals is to get CPAN Testers off Amazon SimpleDB (because it's horrible and expensive). I had already been looking at MongoDB as a replacement even before I joined MongoDB because the document storage model fits well. Now that I work for MongoDB, I have both extra incentive and company support for spending some time on it.

I got Metabase::Web working using my experimental MongoDB backend I wrote years ago and was able to send CPAN Testers reports to a Metabase running locally with a local MongoDB.

Consensus discussions, technical

By Saturday, I was feeling a bit fried from discussion moderations, so for that day's discussions, I focused it on pretty concrete technical decisions, rather than broad culture/governance issues.

These included on questions around the evolution of the CPAN Meta Spec, changes to CPAN Testers grading, signaling a desire for pure-Perl builds to compiler detection tools, and some PAUSE policies or lack thereof.

For the most part, decisions came quickly and I'll be including them in more detail in my subsequent write up.

Computer Game Museum

I intentionally scheduled the consensus discussions with a hard stop time for those of us visiting Berlin's Computerspiele Museum

IMG_20150418_163909214

Visiting the museum was like a traveling through time and visiting my childhood. They had many of the computers, consoles and games I remembered from when I was a kid. And of course, some of the exhibits were playable!

IMG_20150418_165050905 IMG_20150418_171232553_HDR

One of the more memorable exhibits in the museum was a rather bizarre game – essentially Pong, but where each player had to keep a hand on a pad that alternatively shocked, scorched, or whipped the player's hand. The first person to take their hand off the pad lost. Several people in our group left with bruises from playing.

IMG_20150418_173125206 IMG_20150418_174510683

I played only once and did not lose. :-)

Day 4

On Sunday we had the most important discussion of the hackathon, one that will probably have a significant impact on how I work as a CPAN author. In my other time, I tied up a lot more loose ends and started thinking about how to build a regression tester.

Responsible author practices and responsible forking

Neil Bowers and Peter Ribasushi came up with a very evocative analogy for CPAN and Neil presented it to the group to kick off our discussions.

To paraphrase, consider CPAN like a river, where a distribution's position in the river depends on total number of other "downstream" distributions that depend on it — i.e. dependents plus dependents of dependents plus dependents of those, etc. until there are no more dependents to count.

A distribution that has no dependents is all the way downstream. By contrast, most of the CPAN toolchain is all the way upstream. A broken module upstream causes cascading failures all the way down the river. See Neil's blog post, The River of CPAN, for more.

That suggests that the standards for "responsible authorship" are different at different points in the river. An experimental module I throw over the wall to CPAN needs a lot less care than something I wrote that has thousands of downstream dependents.

After agreeing on the river analogy, the group then brainstormed and categorized ideas for good author practices for "way upstream", "way downstream", and "in the middle". I'll be writing those up in detail in my summary of the Berlin consensus discussions.

Finally, we spent some time talking about what to do when an upstream distribution isn't being managed with the standards of care that one would like. The group agreed that if an upstream author isn't responsive to one's concerns, then forking (same code/API) or replacing (new code, maybe new API) are the only real options. We talked about how to do that respectfully to the upstream author and responsibly to the community. I'll be writing up those guidelines later as well.

Rethinking Metabase

Back on Saturday, I'd revived Metabase::Web, so I started examining the MongoDB backend in more detail, to see if the choices I made years ago still made sense. Unfortunately, the more I looked at how Metabase organizes information, the more I saw how terribly convoluted it is. It's got second-system problems all over.

I think I've learned a lot since coming up with it 7 or 8 years ago, so rather than blindly pushing forward with a migration, I decided to start thinking about a plan for CPAN Testers 3.0 instead. (It will be done by Christmas.)

Test::Reporter::Transport::MongoDB

Before designing CPAN Testers 3.0, though, I needed to consider how to regression test Test-Simple and its 6000+ selected dependents (still a subset from the entirety of CPAN that depends on it).

In the past, when regression testing Module::Build, I use CPAN Testers smoking tools, but saved reports as files. One directory had reports with the stable version installed and another had reports with the development version installed and I compard the differences.

Given the scale of testing, and DreamHost's generous donation of testing machines, I wanted to explore distributing the testing jobs and collecting the results back rather that shuttling report files around multiple machines.

To that end, I wrote and tested a rough draft of Test::Reporter::Transport::MongoDB to send a smoke test report directly to a MongoDB server. That will let me collect the distributed regression test reports and meanwhile experiment with how best to organize report data as the basis for CPAN Testers 3.0.

Fixing Capture::Tiny

For quite a while, Capture::Tiny has been failing tests with bleadperl on Windows due to problems closing a bad file descriptor. I hadn't been able to figure out why, but at the hackathon, Bulk88 got out his C debugger and dug into the problem, identifying that the underlying issue had been there for years, but that the new automatic-close error warnings in blead made it visible.

With that little hint, I quickly tracked down the problem to an unnecessary Windows-level OS handle close. I can't even remember why that I thought it was necessary, but changing it to an ordinary Perl close solved the problem. Bulk88 verified the fix and I shipped a trial to CPAN.

UTF-8 YAML::XS API

This year, Ingy was at the hackathon working on the YAML ecosystem. A big sticking point to unifying YAML implementations is the different expectations for the input to the Load function. YAML and YAML::Tiny expect it to be a character string, but YAML::XS expects it to be a UTF-8 encoded string.

We discussed whether it's possible for the XS Load to detect whether it's being passed character data or UTF-8 encoded data and, in short, it's not possible to do so unambiguously for all all input.

PAUSE on Plack!

IMG_20150419_155711537

I had nothing to do with this — it was all Kenichi Ishigaki! But it was the awesomest hack of the hackathon, in my opinion, so I wanted to mention it. The picture above shows Andreas and Kenichi with their PAUSEs running side-by-side.

Class::Tiny custom accessors

As part of his work hacking on CPAN::Common::Index, Miyagawa found some surprising behavior in the use of custom accessors. It was a great point and I need to think about whether to do something about it or just better document it to be less surprising to others.

Hash::Ordered pull requests

In the past month, I'd received a couple pull requests for Hash::Ordered so I tested them, applied them, and shipped a Hash::Ordered dev release.

Day 5

Monday was travel day. I spent much of the flight working to wrap things up. Using notes from Wendy, I typed up an outline of all the consensus discussions. I also worked on optimizing my META file scanner, getting it down from 40 minutes to about 10! I also scripted my Test::Builder dependency analysis so it would be repeatable as I continued to refine my META analyzer.

Parting thoughts

As in the past, I finished this hackathon feeling a bit wrung out, but excited and proud of everything that got done.

The CPAN ecosystem is not just libraries of code and tools; it's a community of people.

This year in particular, I think the "consensus discussions" have a chance to positively influence people far outside the QA/Toolchain bubble.

For more on the hackathon, check out the hackathon's blogs and results pages.

Thanking those who made it possible

As an invitational-event, the hackathon wouldn't be possible without the sponsors who provide the funds to bring so many people together.

It's not too late to donate! Any support in excess of this year's budget will be banked for the 2016 hackathon, so if you're feeling inspired, please give back or encourage your employer to do so.

DONATE HERE.

This year, I particularly want to thank my employer, MongoDB, for sponsoring me to attend.

Our other wonderful corporate sponsors include thinkproject!, amazon Development Center, STRATO AG, Booking.com, AffinityLive, Travis CI, Bluehost, GFU Cyrus AG, Evozon, infinity interactive, Neo4j, Frankfurt Perl Mongers, Perl 6 Community, Les Mongueurs de Perl, YAPC Europe Foundation, Perl Weekly, elasticsearch, LiquidWeb, DreamHost, qp procura, and Campus Explorer. These companies support Perl and I encourage you to support them.

We also had several generous individual contributors, who also deserve our thanks: Ron Savage, Christopher Tijerina, Andrew Solomon, Jens Gassmann, Marc Allen, and Michael LaGrasta.

I particularly want to thank our organizer, Tina Müller, and the others who helped her plan and run an excellent event!

I also want to acknowledge Wendy van Dijk, who was my scribe for the hours of group discussions I moderated. Having her capture the discussions and transcribe the notes was an enormous help and I wouldn't want to have led those discussions without her backing me up. Thank you, Wendy!

Posted in perl programming | Tagged , , , , , , , , | Comments closed

How to add 'provides' metadata via Makefile.PL

My last post about PAUSE permission problems suggested to manually add a 'provides' field to your metadata files if PAUSE can't determine what packages are in your distribution. I realized that people might not know how to do that, so this is a quick tutorial.

One reason PAUSE might not be able to find your package names is if you generate your .pm files for some reason. I'm going to use a super-simplified example distribution to show what to do.

Consider a distribution for a hypothetical "Acme::Provides" with these four files:

  • Makefile.PL — our distribution build tool
  • Provides.pm.PL — our module generator
  • t/00-load.t — a test that the built module can be loaded
  • MANIFEST — a listing of these four files

The Provides.pm.PL generates Acme/Provides.pm directly into the blib directory when make runs, so there is no .pm file hanging out in lib for PAUSE to examine.

When we run the Makefile.PL we get a Makefile that will generate the .pm file we need. We can see that it does so by running make and make test:

$ perl Makefile.PL
Checking if your kit is complete...
Looks good
Generating a Unix-style Makefile
Writing Makefile for Acme::Provides
Writing MYMETA.yml and MYMETA.json

$ make
"/Users/david/.plenv/versions/20.2t/bin/perl5.20.2" "-Iblib/arch" "-Iblib/lib" Provides.pm.PL blib/lib/Acme/Provides.pm

$ make test
PERL_DL_NONLAZY=1 "/Users/david/.plenv/versions/20.2t/bin/perl5.20.2" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/00-load.t .. ok
All tests successful.
Files=1, Tests=1,  0 wallclock secs ( 0.02 usr  0.01 sys +  0.01 cusr  0.00 csys =  0.04 CPU)
Result: PASS

We can build the distribution directory with make distdir:

$ make distdir
rm -rf Acme-Provides-0.01
"/Users/david/.plenv/versions/20.2t/bin/perl5.20.2" "-MExtUtils::Manifest=manicopy,maniread" \
                -e "manicopy(maniread(),'Acme-Provides-0.01', 'best');"
mkdir Acme-Provides-0.01
mkdir Acme-Provides-0.01/t
Generating META.yml
Generating META.json

Here is the generated Acme-Provides-0.01/META.json file (omitting 'no_index' and 'prereqs' fields for brevity):

{
   "abstract" : "Demonstration of adding provides metadata",
   "author" : [
      "David Golden <dagolden@cpan.org>"
   ],
   "dynamic_config" : 1,
   "generated_by" : "ExtUtils::MakeMaker version 7.04, CPAN::Meta::Converter version 2.150001",
   "license" : [
      "artistic_1"
   ],
   "meta-spec" : {
      "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec",
      "version" : "2"
   },
   "name" : "Acme-Provides",
   "release_status" : "stable",
   "version" : "0.01"
}

That doesn't tell PAUSE about our generated module, so if we uploaded this distribution, it wouldn't get indexed because we wouldn't be claiming the package name "Acme::Provides" to match the distribution tarball name "Acme-Provides-0.01.tar.gz".

However, we can use the META_ADD directive in the the Makefile.PL to add that information ourselves:

    META_ADD => {
        provides => {
            'Acme::Provides' => {
                file => 'Provides.pm.PL',
                version => '0.01',
            },
        },
    },

Check out the revised file here: Makefile.PL.

Now, if we re-run Makefile.PL and regenerate the distribution directory with make distdir, we can see our 'provides' data added to the generated Acme-Provides-0.01/META.json file (again omitting 'no_index' and 'prereqs' fields for brevity):

{
   "abstract" : "Demonstration of adding provides metadata",
   "author" : [
      "David Golden <dagolden@cpan.org>"
   ],
   "dynamic_config" : 1,
   "generated_by" : "ExtUtils::MakeMaker version 7.04, CPAN::Meta::Converter version 2.150001",
   "license" : [
      "artistic_1"
   ],
   "meta-spec" : {
      "url" : "http://search.cpan.org/perldoc?CPAN::Meta::Spec",
      "version" : "2"
   },
   "name" : "Acme-Provides",
   "provides" : {
      "Acme::Provides" : {
         "file" : "Provides.pm.PL",
         "version" : "0.01"
      }
   },
   "release_status" : "stable",
   "version" : "0.01"
}

Now, if we upload this distribution, PAUSE will see us claiming "Acme::Provides" and all should be well.

Posted in cpan | Tagged , , , | Comments closed

© 2009-2015 David Golden All Rights Reserved