Announcing the MongoDB Perl driver v1.0.0 Release Candidate

I'm very happy to announce the release of the MongoDB Perl driver v1.0.0 Release Candidate 1 to CPAN!

This next-generation driver is a substantial rewrite of the original and has been my primary project as lead engineer for the Perl driver since last summer.

Rationale for a rewrite

Over the last year, MongoDB engineers rewrote most of the language drivers maintained in-house. As mentioned in the next-generation drivers announcement, we've built the next-generation drivers on a common set of specification documents. Whereas previous drivers were written idiosyncratically, each driver now aims to deliver similar underlying behaviors through similar APIs, while still striving to be idiomatic for its language.

For the Perl driver, however, we had several goals that went above and beyond cross-driver consistency. In particular, we wanted to address some fundamental deficiencies in the "v0.x" series of drivers:

  • intra-driver consistency – many parts of the v0 API were inconsistent, behaving differently from method to method; the v1 API minimizes developer surprises by improving consistency in return types and exception mechanisms.
  • encapsulation – too many low-level, internal operations were exposed as part of the API, which complicates maintenance work; the v1 API aims to minimize the "public surface" available to developers, allowing faster future development keeping up with MongoDB server enhancements with less risk of breakage.
  • abstraction – many v0 methods returned raw server documents for end-user code to inspect, which is brittle in the face of changes in server responses over time; the v1 API uses result classes to abstract the details behind standardized accessors.
  • server compatibility – some new features and behavior changes in the MongoDB server no longer fit the old driver design; the v1 driver transparently supports both old and new servers.
  • portability – the v0 driver had a large dependency tree and substantial non-portable C code; the v1 driver removes some dependencies and uses widely-used, well-tested CPAN modules in place of custom C code where possible; it lays the groundwork for a future "pure-Perl optional" driver.
  • round-trippable data – the v0 BSON implementation could easily change data types when round-tripping documents; the v1 driver is designed to round-trip data correctly whenever possible (within the limits of Perl's dynamic typing).

Summary of changes

As you might imagine, there are an enormous number of changes, listed in detail in the MongoDB::Upgrading document.

Here is a summary of some of the more substantial changes:

  • Configuration – MongoDB::MongoClient gains a number of new configuration options to control server selection and timeouts; configurations are now always read-only; some existing options are deprecated; others are removed entirely where they no longer fit the new paradigms of the client
  • Lazy connection – creating a MongoDB::MongoClient object no longer connects to a server right away. This is the new standard for all official MongoDB drivers, but might break code that expected an immediate error from new!
  • Failover – when a networking error occurs or a server goes away, an error is throw. If handled, the next attempt at communicating with the server will automatically reconnect. For a replica set, this means it will failover to a new primary when a new primary is ready.
  • Exceptions – All errors now throw exceptions rather than relying on inconsistent ways of returning errors from methods
  • Read preferences and write concerns – these are now expressed as objects; they can be set at the client, database or collection object level
  • Authentication – this is now based only on configuration options and occurs immediately when any server connection is made. This is another change for all official drivers that harmonizes the behavior of MongoDB's different authentication mechanisms, some of which could only happen at connection time and some of which could be done later.
  • Deprecations – pretty much the entire MongoDB::Collection API was deprecated and replaced with the new official-driver-wide CRUD API. The deprecated methods are now undocumented, but still mostly work as they used to. A handful of methods in other classes were deprecated as well.

BSON encoding changes

BSON encoding was substantially overhauled. The various $MongoDB::BSON::... global variables have been removed, as BSON encoding is now encapsulated and can be set per-client, per-database or per-collection.

Integers will now be encoded to the smallest BSON integer type that fits rather than always taking up a fixed size (equal to the compiled integer size of the interpreter). This means that storing zero no longer takes up 64 bits on a 64-bit perl.

All common JSON boolean classes will now encode correctly. Time::Moment is supported for both encoding and decoding datetimes.

Testing the release candidate

If you use the MongoDB Perl driver, I strongly encourage you to read MongoDB::Upgrading and test your code with the release candidate driver.

You can install it from CPAN like this:

$ cpan MONGODB/MongoDB-v0.999.999.4-TRIAL.tar.gz

If you encounter problem or have questions, please open tickets on the MongoDB Perl driver bug tracker.

Barring any show-stoppers or urgent changes, the stable v1.0.0 MongoDB Perl driver will be shipped to CPAN in mid-to-late August.

Posted in mongodb, perl programming | Tagged , , | Comments closed

Visualizing Perl 5 Release History – 2015 edition

I've update my Perl 5 release history chart through the release of Perl 5.22.0. As I've pointed out before, the steady march of annual releases sets clear expectations for future development.

Perl Release History 2015

Perl Release History 2015

Looking at this chart, I'm always struck by how the Perl 5.10 series differs from the rest. Perls 5.4, 5.5, 5.6 and 5.8 arrived and delivered new features (and breaking changes) not far off the current annual release cycle.

Perhaps the people who think Perl is changing too fast were conditioned by the 5½ year gulf from 5.8.0 to 5.10.0 or the 2½ year gap from 5.10.0 to 5.12.0. Together, that's 8 years of infrequent change, particularly for those who skipped 5.10 and stuck with 5.8.

I recently wrote about why you shouldn't waste time on Perl 5.6. The Perl toolchain targets Perl 5.8.1 and I suspect a large portion of CPAN still targets Perl 5.8, released 13 years ago. Is it any wonder that we wind up with presentations like Stevan's "Perl is not Dead, It's a Dead End"?

I have a different hypothesis.

Perl isn't dead, it's just living in the past.

At some point, the community needs to shed its attachment to Perl 5.8.

Or maybe we'll all just start using Perl 6, instead.

Posted in perl programming | Tagged , , , | Comments closed

Why you shouldn't waste your time on Perl 5.6

This is a sort-of response to mst's On Perl 5.6 post. Mostly, I agree with Matt about not gratuitously breaking 5.6 compatibility (i.e. to "force people to upgrade").

However, in recent years, I've never received a single bug report or patch from a person actually using 5.6 for anything except smoke testing things on 5.6.

The argument seems to be: "because I think it's important to test things on 5.6, you should make it possible for me to do so". This seems circular.

Testing on 5.6 is pointless if no one is using 5.6 for anything except testing 5.6.

There is an argument sometimes made that if there are users on 5.6, then they should just upgrade. That seems rather obviously wrong, since if there really are such people, there are probably good reasons why they can't. But at the same time, it seems entirely inconsistent for someone to insist that they can't possibly upgrade their perl to something released in the last decade, yet they want the latest and greatest from CPAN.

If there really are people using 5.6 for real, do they really expect CPAN modules to just work? Are they really put out if they don't? Or have they learned to work around it, just like they've worked around the myriad of feature gaps and bug fixes in the last dozen years?

As far as accepting patches for 5.6 support goes, I have very mixed feelings about it. Reviewing patches takes time. Patches accepted might then implicitly indicate support for 5.6 going forward. And if there are bugs, the original patch author might not be around.

So a simple patch that restores 5.6 compatibility might be fine. E.g. "hey, I removed your 'use 5.008' and tested it on 5.6.2 and it passed tests just fine!". But a complex, possibly fragile workaround burdens the receiver if accepted.

If a real person actually using 5.6 asked me to do that, I might accept, depending on how ugly the patch is. But I'm far, far less inclined to do so when sent a gnarly patch by some self-appointed compatibility police telling me 5.6 support is still important "just because".

While it's not for me to tell people what to do with their volunteer time, I suspect that whatever time people are spending writing 5.6 patches and doing 5.6 smoke testing could be far more valuably spent on other things that affect orders of magnitude more real users.

Posted in perl programming | Tagged , | Comments closed

The Annotated Berlin Consensus

The official Berlin Consensus document is on Github. This is an annotated review of it.

The Berlin Consensus

At the first Perl QA Hackathon (QAH) in 2008 in Oslo, a number of QA and toolchain authors, maintainers and experts came together to agree on some common standards and practices. This became known as "The Oslo Consensus".

At the 2013 QAH in Lancaster, a similar brain trust came together to address new issues requiring consensus. This became known as "The Lancaster Consensus".

At the 2015 QAH in Berlin, another group assembled to address new issues, with a particular focus on toolchain governance and recommended standards of care for CPAN authors.

As with other consensus discussions, the speed of implementation of any ideas discussed will depend on the interests and availability of volunteers to do the work.

CPAN standards of care

An ongoing challenge for Perl (and many other languages) is how to balance the benefits of a large repository of open source libraries (i.e. CPAN) against the fragility of large dependency trees. The consensus discussion group agreed on some recommended practices for CPAN authors that will, if widely adopted, improve the general standard of care of CPAN distributions and reduce the fragility of large CPAN dependency trees.

The river analogy

As an analogy to guide recommendations, the group considered CPAN like a "river". Distributions with nothing depending on them are all the way "downriver". Once a distribution gains a dependent, it moves slightly "upriver".

Credit for the river analogy goes to Neil Bowers and Peter Rabbitson. Neil's blog post, "The River of CPAN", expands on the analogy.

Distributions can have direct dependents — things that list it among prerequisites — but direct dependents may have dependents, too, all the way down to distributions with no dependents at all.

By examining this deep dependent tree to find the total number of downriver dependents, we can assess how far "up the river" anything is, which is tantamount to describing how many things it has the potential to break.

As distributions have more and more total dependents in their deep dependent trees, the further upriver they are. All the way upriver lies the toolchain — modules like ExtUtils::MakeMaker, which, if buggy or broken, can break all of CPAN.

Visually, if a module upriver breaks — fails to build, fails tests, breaks compatibility, etc. — it pollutes the river and everything downriver of it suffers.

The actual expression during discusion was "shitting in the river", which makes the point more dramatically (and more effectively).

Therefore, a distribution's position in the river is a guide to the recommended standards that an author should apply to its care.

As a distribution grows in popularity and gains direct dependents, those dependents grow the deep dependent tree; the distribution moves upriver and the standards that responsible authors should apply change accordingly.

Authors, who could have distributions at multiple places along the river, should consider distributions individually, and avoid applying downriver standards to distributions that have moved upriver.

Some authors treat CPAN like a playground sandbox. Other authors work mostly on distributions with lots of dependencies. And some have both sorts plus stuff in the middle. It's important to keep in mind that the recommendations that follow don't apply to authors. They apply to distributions and authors can and should apply different standards to different distributions.

The group also discussed how we never know how much of DarkPAN code depends on a given distribution, making any CPAN-focused analysis of the river only an approximation of the true number of downriver dependents.

Neil was on a roll. Another blog post of his describes an idea for letting DarkPAN register their dependencies.

Recommended practices for CPAN authors

For the sake of discussion, the group arbitrarily divided the river into "way downriver", "way upriver" and "in the middle", and considered recommended practices for each.

While it wasn't discussed at the time, subsequent analysis found:

  • ~50 distributions with 10,000 or more total dependents
  • ~200 distributions with 1000 to 9,999 total dependents
  • ~2,000 distributions with 10 to 999 total dependents
  • ~8,000 distributions with 1 to 9 total dependents
  • ~16,000 distributions with no known dependents

Neil, again, offers a blog post with CPAN River statistics. I also made a log-log plot showing what percent of CPAN has how many downstream dependents.

Again, while it wasn't discussed, if one needed guidance about whether a distribution is upriver or downriver or in the middle, one might consider the three groups roughly like this:

  • "way downriver" — zero to low tens of total dependents
  • "in the middle" — high tens to low thousands of total dependents
  • "way upriver" — high thousands to tens of thousands of total dependents

Defined like this, "way downriver" is about 95% of CPAN. "Way upriver" is less than half a percent. "In the middle" is what's left.

The recommendations that follow for these groups are aspirational. Not every distribution can or will do everything here, but the more it can, the better off users of that distribution will be.

Practices for distributions "way downriver"

Any author's first upload is, by definition, way downriver. Brand new CPAN authors should read a good "how to" for CPAN authorship, such as the "About PAUSE" pages and "perldoc perlnewmod".

Downriver distributions should be "well-formed", following many of the basic "Kwalitee" rules described on CPANTS. They should have documentation (ideally spell-checked), a "t/" directory with tests (that are run before shipping, e.g. with "make disttest"), and a clearly stated license.

Distributions should respect CPAN namespace conventions.

Distributions should include a "Changes" file that highlights key differences between releases. They should have a META.json file that follows the CPAN::Meta::Spec and a corresponding META.yml file for older perls.

The author should decide on an "issue tracker", whether the default or an issue tracker combined with a source code repository and include the issue tracker URL in the META.json file. Distribution documentation should include contact information if it differs from the author's email address.

If a distribution duplicates features of existing modules, the documentation should describe why it was created and how it differs from other, similar modules (e.g. a "SEE ALSO" section).

There are also some practices that distributions should avoid.

When configuring, building, testing or installing: don't attempt to "phone home" over the Internet; don't modify the filesystem outside the distribution directory or the system temporary directory; don't send emails.

Don't hijack other modules by installing a .pm file that overwrites or otherwise shadows a module that ships from another distribution.

Don't run "author tests" (e.g. pod formatting, coverage, spelling, etc.) on end-user systems.

Don't be malicious.

Don't be rude.

One example that came up in discussions was a Makefile.PL that made snide comments only on Windows machines. Not cool.

Practices for distributions "in the middle"

Distributions "in the middle" should follow all the recommendations of those "way downriver", plus additional recommendations.

Distributions should plan for API stability. Breaking changes should be made as rarely as possible and should occur after a period of deprecation. Including a statement about a stability policy in documentation is recommended as well, to help end-users know what to expect.

It's perfectly fine for a distribution to plan for instability. Some distributions have high velocity and make backward breaking changes regularly. That's absolutely OK if clearly documented so users know what to expect.

Distributions should aim to be portable across "mainstream" operating systems, whenever possible. They should attempt to support older Perls (e.g. 5.8 or 5.10) and should, regardless, have an explicit minimum perl version in their prerequisite metadata. Portability commitments should also be included in documentation.

We didn't get into what "mainstream" means, so it's sort of in the eye of the beholder. Personally, I think that's at least Linux, Windows and OS X. Maybe at least some BSD flavors, too. For seeing what Perl versions you support, check out Perl::MinimumVersion and Perl::MinimumVersion::Fast

Distributions should have a public source code repository (listed in the META files), including contribution instructions. Repositories should be connected to some sort of continuous integration service for early identification of commits that cause tests to fail.

Distributions hosted on Github should look at Travis for this.

Distributions should be licensed under the terms of Perl itself or else a compatible OSI-approved license. E.g. a GPL-only (or other "viral" license) may limit how far upriver a distribution can go. Note, also, that a "public domain" dedication is not always legally valid and is not an OSI-approved license.

Distribution authors should pay attention to the issue tracker and at least acknowledge bug reports in a timely fashion.

Distributions should have a co-maintainer registered on PAUSE or other documented succession plan in case something happens to the original author.

More on this below in the section about PAUSE adoption policies.

Distributions should aim for high quality releases: they should have good test coverage; authors should use author-only tests of distribution quality before shipping (e.g. in xt); authors should monitor CPAN Testers results to identify and fix broken releases.

Before releasing a stable, indexed version to CPAN, authors should release a non-indexed, developer release to CPAN and monitor the CPAN Testers Matrix results to ensure that nothing is broken across major operating systems or versions of Perl. (N.B. typically 36 to 48 hours is sufficient for good coverage.)

Distribution authors should be mindful of a distribution's place in the "CPAN river". They should pay attention to the stability and quality of upriver dependencies and should consider the stability and quality of a distribution before adding it as a new dependency. They should monitor the total number of downriver dependents to reassess the standards of care to apply to the distribution.

In particular, be very thoughtful about adding a dependency that is downriver from your distribution, because any new dependency moves upriver (by definition). If the author of that dependency has a worse standard of care, your distribution — and everything that depends on it — becomes more fragile. The farther upriver your distribution is, the more you should consider discussing standards of care with the author of a potential dependency before relying on it.

For distributions in the upper-middle parts of the CPAN river (e.g. 1000+ total downriver dependents), authors should consider regular testing of some or these downriver dependents against new versions of the distribution before shipping a stable release.

Practices for distributions "way upriver"

Distributions "way upriver", should follow all the recommendations of those "in the middle" and "way downriver", plus additional recommendations.

This section is the most aspirational. Many toolchain modules don't fully live up to these practices. Many are technically hard. But everyone present agreed that toolchain authors should aim to do these things, even if it takes a while before it's regular practice.

Distributions should undergo some sort of code review before a stable release, so that there is more than one set of eyes responsible for the code.

Distributions should have some documented public forum for discussing the distribution and its evolution. Proposed major changes should be discussed before implementation.

Distributions should design for forward-compatibility, when possible, so that old code can handle inputs designed for later versions of the distribution and either adapt or throw an informative error, rather than fail strangely.

Distributions that use C or XS should aim for C89 compatibility, the same as Perl itself.

Distributions should do performance testing before releasing major changes as stable.

Distributions should be tested against bleadperl, on as many platforms as they can.

Authors should release a non-indexed development version of distributions before an indexed, stable version for any non-trivial, non-emergency change.

Authors should regularly check the full CPAN testers matrix for distributions to find failure patterns that affect particular platforms or versions of Perl.

Authors should test at least a representative portion of direct (or even total) downriver dependents against new versions of a distribution before shipping a stable release.

Stable releases of distributions should not be shipped before bedtime or the weekend or any other time when the author is unavailable to fix or revert an unexpectedly broken release. Authors should watch closely for indications of breakage after release.

Responsible forking

While we hope that authors follow standards of care consistent with a distribution's place in the CPAN river, there will always be times when some distribution dependency isn't meeting one's expectations. In such a case, a distribution always has the option to fork the dependency or to replace it with an alternative.

Because forks can be controversial, we discussed practices for responsible forking.

We talked quite a bit about how there needs to be some recourse when the author of some dependency doesn't exercise the standard of care you want, doesn't change in response to feedback and isn't willing to hand over maintenance to a more responsible maintainer. The group wanted to lessen the stigma of forking — not for fundamental differences or outright neglect, but just to achieve a different stability goal in the dependency tree.

When this section talks about "forks", we intend that term to cover literal forks (same code base to start, and same API), rewrites (new code base, but same API) and alternative replacements (new code, with new API).

Before forking

Before considering a fork, authors should talk to the current maintainer(s) about their concerns, ideally in a public forum so that concerns and responses are on the record.

Authors should also consult interested parties, such as other authors with similar concerns (e.g. authors of other distributions who also depend on the distribution to be forked, as well as direct downriver dependents who might be affected by the change in their own upriver dependency tree.

Changing a distribution's dependencies is a deep-dependency change for every dependent. This can be a big impact for people who manage full dependency trees via Carton or Pinto or who have to do compliance audits of all new dependencies. Sometimes it's necessary, but it should never be done lightly.

When deciding to fork

If the original maintainer isn't able to satisfy concerns and the decision to fork is taken, authors should notify the original maintainer(s), again ideally in a public forum so that the history is available for future users to review.

The author should also notify interested parties that were consulted, so that they can participate in the process, if desired.

To help future potential users understand the differences between the modules, the author should document the rationale for forking (focusing on factual rather than emotional issues or personal attacks).

Authors should try not to "burn bridges". On seeing a fork under development, the original maintainer may reconsider his or her original stance, or offer to hand over maintenance or otherwise find some mutually satisfactory resolution that doesn't require the fork to be published to CPAN.

After a fork is published

When the fork is shipped, the author should make a general announcement to the same forums used for earlier discussions.

The author should carefully consider impacts on the CPAN river when migrating. E.g. adding a newly published fork as the dependency of a mid-stream distribution catapults the fork upriver, which adds it as a dependency to all downriver distributions and puts them all at risk if the fork introduces new bugs or other surprising behavior.

New code has no track record. Writing a fork and putting it upriver immediately is risky. It might make sense to release a fork independently for a while and let it season and get feedback before using it to replace a fragile dependency.

When considering urging other distributions to adopt a fork in place of the original, authors should use a "smart bomb" approach rather than a "carpet bomb" approach. Rather than making an appeal to all direct dependents of the original, authors should target the appeal. Examples of respectful targeting include (but are not limited to):

  • upriver dependencies that also depend on the original distribution (e.g. as part of an attempt getting the original out of the full upriver dependency tree)
  • other, unrelated distributions that are reasonably believed to be affected by the same issues that prompted the fork

In other words, we don't want a mass call saying "hey, module X is bad, use module Y instead". We'd rather have a specific call, "hey, I'm trying to get X out of my dependency tree because Z, would you consider switching to Y also?" or "hey, you're using module X to do Z, which has this particular problem PRQ; I've switched to module Y and you might consider that too."

Toolchain governance

These discussions actually preceded the CPAN River discussions, but it made more sense to write them up the other way. In spots, these points have been harmonized with the points above.

The toolchain, in general terms, refers to those modules that are required to download, build, test and install the vast majority of distributions on CPAN.

Modules like this generally ship with the Perl core (e.g. ExtUtils::MakeMaker) but frequently have a "dual life" independently on CPAN. By definition, these toolchain distributions are "way upriver" on the CPAN river.

Other modules may not ship in the Perl core, but may be popular solutions for specific toolchain-like tasks. Such modules are in the "toolchain" for distributions that rely on them.

The consensus attendees represented a sizeable cross-section of core and popular non-core toolchain maintainers. They agreed that toolchain development and release practices could be improved.

They agreed to "sign on" to a "Toolchain Charter" to govern the ongoing development of toolchain distributions. While this discussion actually pre-dated the CPAN river discussion, the points are largely consistent.

A frequent side topic was whether this could be enforced in any way, and the group — wisely in my view — stuck with the idea that it's voluntary and that if enough people lead by example, then others may follow suit.

Toolchain charter practices

Toolchain authors present agreed to the following principles and practices to ensure good governance and responsible administration.

  1. The toolchain has a wider scope of backwards compatibility goals than the Perl core, as toolchain aims to support every Perl from 5.8.1 onwards.
  2. Toolchain distributions should have more than one "primary" maintainer (regardless of actual PAUSE permissions) and a list should be published showing distributions and maintainers.
  3. Functional changes to toolchain distributions need more than one set of eyes approving changes before shipping a stable release to CPAN
  4. Major or breaking changes to a toolchain distribution should be discussed in a public, archived venue. The cpan-workers list was chosen as the initial venue.
  5. If discussions about the evolution of toolchain distributions fail to achieve consensus, toolchain authors agree to defer to a designated "tie-breaker" authority. The Perl pumpking (regardless of who that may be at any point in time) was the initial choice for tie-breaker.
  6. No stable distribution should ship until some degree of stability is verified (e.g. though a combination of smoke reports, dependency smokes, etc.); the choice of specific mechanisms were left to the discretion of maintainers. The group called out one exception: emergency fixes to a broken stable release should proceed without delay.
  7. Toolchain authors agreed that when a primary maintainer steps down or becomes permanently unavailable, the toolchain authors as a group will jointly agree on a successor. PAUSE administrators should defer to the consensus (or decision of the tie-breaker) for handing over PAUSE permissions as needed. Any successor should agree to the practices described herein.

Point #5 was sort of a default position. There was no consensus that it had to be the pumpking forever, but given that so many of the toolchain modules are dual-life, the pumpking was an unobjectionable choice to start with.

We didn't get into exact mechanics of #7. My thought is that it works in two parts: (1) an existing primary maintainer will consult the broader group for "advice and consent" in choosing a successor; (2) if the primary maintainer is unavailable for a long time, PAUSE admins will consult with the broader group before transferring permissions.

Toolchain authors not present are strongly encouraged to agree to these practices or to hand over their toolchain maintenance responsibilities to others who are willing to do so.

Model behavior we want to see in other "way upriver" distributions

The group briefly discussed whether or how to urge other important, non-toolchain, but widely depended-on distributions to sign on to the Toolchain Charter or an equivalent.

The consensus was that the Toolchain Charter should serve as a model for the kind of behavior we'd like to see broadly in other widely used distributions, but that public role-modeling and public promotion was preferred over targeted appeals or peer pressure on non-toolchain authors.

Rather that go out and try to get various upriver distributions to "sign on" to the Toolchain charter, the Toolchain charter should just be an example that other authors/groups with similar goals might adapt for their own purposes.

META spec

The group discussed various potential changes to the CPAN META Spec.

No need for META 3 yet

There was consensus that developing a "v3" META spec was not yet needed. There was no burning platform for it, and some forward compatibility risks to address first. The consensus was that continuing to develop x_ fields was sufficient to address emerging needs.

prereqs keys to ignore

The 'prereqs' data structure can contain fields that can't be resolved as actual prerequisites. Examples include "perl", "Config" and "Errno". The group agreed that a list of known keys should be included in the implementation notes section of the META spec.

Specifically with regards to 'perl', we agreed that CPAN clients must never try to install 'perl'. If it is specified as 'requires', installation must abort. If specified for 'recommends' or 'suggests', CPAN clients may warn about it, but must otherwise ignore it.

Specifying a "recommended" version of Perl seems weird to me, but apparently some people do it meaning "I support older Perls, but this really works best on version 5.x.y". CPAN client support for communicating that isn't great and needs to be improved.

Further clarifying 'recommends' and 'suggests'

The group agreed that these prerequisite relationships were still being misunderstood and that the spec should be clear that "recommends" is optional and installed by default, whereas "suggests" is optional and not installed by default.

CPAN client interpretation of 'x_breaks'

The group agreed that x_breaks needs to be implemented in CPAN clients.

This is the replacement for conflicts which sort of means "don't install me if this other module is present", whereas what most people seem to want is "I break this other module, so after upgrading me, you need to upgrade it, too."

x_breaks is a top-level key in META. It is a hash with module name as keys and a version range that indicates the broken range, usually using as a <= relation. E.g. in YAML format:

    Foo::Bar: <= 1.23

When x_breaks is found in META, after successful installation, CPAN clients should check if any of the modules listed in x_breaks are installed and if their $VERSION matches the version range. If so, and if the latest version on CPAN does not match the x_breaks version range, then the CPAN client should install the latest version of that module from CPAN

CPAN clients may warn about unsatisfiable x_breaks before or after installing a module, may prompt before installing x_breaks, etc., depending on their individual approaches to configuration, prompting and warning.

Need for a post-install recommendations key

There are circumstances where two distributions can each operate without the other but both benefit from the other's presence. In this case, each specifying the other as 'recommended' causes a circular dependency, albeit an optional one, which can confuse CPAN clients and dependency analysis.

To avoid this, the group agreed there needs to be a new META key that requests that CPAN clients install a distribution after installation is complete — not a prerequisite relationship, but an "also install" relationship.

We agreed that Karen Etheridge volunteered to prototype such a key and discuss it with CPAN client authors for feedback and potential implementation.

Signaling pure-Perl

In the Lancaster Consensus, the group agreed on command-line arguments for Makefile.PL and Build.PL to signal that a pure-Perl build is desired. In practice, the problem with this approach is that distribution authors must each individually check for such flags and take them into account.

The group agreed that since there are relatively few compiler-detectors (e.g. ExtUtils::CBuilder), it would be better to have a common environment variable that signals to such compiler detectors to report that no compiler is available. This would then have an effect on all distributions using that compiler detector.

Peter Rabbitson volunteered to pick a name for the variable and Leon Timmermans agreed to implement it in ExtUtils::CBuilder.

CPAN Testers grading

The historical policy of CPAN Testers was to not send failing test reports when prerequisites could not be satisfied. The group agreed that getting such reports – under some other category than "FAIL" — would make it easier to detect when upriver dependencies were broken.

The idea is that if your tests fail because some upriver dependency can't be installed, then your module can't be automatically installed. It might not be your fault, but everything downriver of you is screwed until you work around the problem or get the upriver dependency fixed.

Barbie and David Golden volunteered to come up with a proposal for review on the cpan-workers list.


The group considered potential changes to PAUSE administration and policies.

Changing adoption policies

Historically, the process for adopting an abandoned distribution allowed the first person to petition PAUSE admins for a takeover to receive permissions. In light of concerns about the stability of distributions with lots of downriver dependencies, the group thought a policy of "first warm body" adoption was no longer a wise idea.

The group considered the opposite approach — that PAUSE admins no longer transfer permissions at all; that if the primary maintainer were gone, that a distribution was effectively end-of-life. The group agreed that this would do more harm than good.

The group agreed that for distributions with no known dependencies, the "first warm body" rule was acceptable. For distributions with dependencies, PAUSE administrators will consider the number of downriver dependencies and exercise their judgment and consider the track record and involvement of a petitioner in the distribution in question. E.g. they will consider existing co-maintainer status, commit involvement, etc. Without obvious involvement of this sort, a petitioner will need to present some evidence of support from interested parties (e.g. direct downriver dependents).

PAUSE administrators may elect to deliberate on their decision in private, but will announce a decision and rationale publicly.

Distributions that want to avoid succession problems are encouraged to add co-maintainers as designated successors.

Rewriting PAUSE not needed

The group considered briefly whether PAUSE needed to be rewritten from scratch. The consensus was to proceed with efforts to Plack-ify it and use that as a basis for additional development.

Kenichi Ishigaki (charsbar) pretty much made this discussion moot. Read his blog post about PAUSE on Plack for details.

Encouraging good licensing

The group considered whether PAUSE should require a detectable license for indexing; instead the group decided that authors should be encouraged, but not required, to have one. E.g. the indexer email could notify authors that no license was found and encourage them to add one.

Responding to take-down requests

We discussed whether PAUSE should have a formal policy about take-down requests. The consensus was that it was unnecessary and that instead, PAUSE admins will continue to exercise judgment.

However, we agreed that PAUSE (and all other Perl ecosystem sites) should have clearly indicated contact information for take-down notices.

Deleting tarballs faster

Currently, deleting a tarball on PAUSE schedules it for deletion in three days. We agreed that faster – nearly immediate – deletion is desirable to get broken distributions off CPAN quickly.

However, we agreed there should be a slight delay to give backpan mirrors a chance to catch up to store the distribution for the historical record.

It turns out that by the time anyone could actually schedule an immediate deletion, replication to BackPAN and other mirrors will have already happened. PAUSE/CPAN is that fast these days.

Any such feature requires some sort of confirmation to help users avoid deleting things unintentionally. Issue #163 has been opened for this feature request.

Requiring a meta file to be indexed

The group discussed whether PAUSE should require a valid META file for a distribution to be indexed. When META generators start universally providing the 'provides' field, this will get PAUSE out of the business of guessing which packages are provided by a distribution, so encouraging including META now will lay the groundwork for that future change.

If implemented, distributions would be required to have either META.json or META.yml to be indexed.

The group thought this seemed reasonable, but there was some concern about how many distributions in the recent past would have had problems if this the policy had been in force. Ricardo Signes volunteered to research how many distributions uploaded recently would have had problems and review with Andreas Koenig before implementing.

Test-Simple roadmap

As part of the consensus discussions, the group considered the roadmap for Test-Simple which provides the Test::Builder library upon which Test::More and most test libraries are written.

This discussion actually led off the QAH in order to give Chad Granum, the current Test-Simple maintainer, some direction for the hackathon. I wonder if it would have gone differently if it had happened after the CPAN River discussion.

Problems in Test::Builder

The group identified several fundamental limitations of Test::Builder:

  • Async/multi-process support is either non-existent or requires complex
    and fragile work-arounds.

  • The lack of extension points has led many test libraries to invade the private internals of the Test::Builder singleton, or to monkey-patch its methods. These hacks are inherently fragile and the more the test ecosystem proliferates such things, the more likelihood that mixing arbitrary test libraries will break in unexpected ways.

  • Testing test libraries requires either parsing TAP or checking against specific strings. This is hard, leaving many test libraries poorly tested, if at all. When there are tests, the tests are fragile. Overall, this limits the ability of test library authors to evolve TAP itself.
  • More generally, having the test library so tightly coupled to TAP means that the capabilities of the library are limited to what TAP supports and there's no easy way to use the standard testing library, but output results in a different form (e.g. xUnit-style).
  • The current Test::Builder is heavy and slow, holding too much state and doing to much repetitive work.

Cost, benefits and risks

The group judged the benefit of addressing the problems above to significant enough to justify the idea of a rewrite of the internals of Test::Builder.

With regard to the proposed Test::Stream-based replacement, the group agreed that the general design could address some of the problems identified, but, in light of the risks, put forth a "punchlist" of specific tasks to be completed before considering the specific implementation of that design ready to move forward:

  • A single Test-Simple branch with proposed code and a corresponding Test-Simple dev release to CPAN
  • Single document describing all known issues
  • Invite people to install latest dev in their daily perls for feedback
    • Write document explaining how to do so and how to roll-back
  • Update CPAN delta smokes: compare test results for all of CPAN with latest Test-Simple stable and latest
    • On Perl versions 5.8.1 and 5.20.2; both with and without threads
    • Finding no new changes from previous list of incompatible modules doing unsupported things
  • Line-by-line review of $Test::Builder::Level back compatibility support
  • bleadperl delta smoke with verbose harness output with latest stable and latest dev; review line-by-line diff and find no substantive changes (outside Test-Simple tests themselves)
  • Performance benchmarking — while specific workloads will vary, generally a ~15% slowdown on a "typical" workload is acceptable if it delivers the other desired benefits.
    • Add patches to existing benchmarking tools in Test-Simple repo
    • Run benchmarks on at least Linux and Windows

The group agreed that additional items could be added through the toolchain governance mechanism.

The discussion of Test-Simple was explicitly divided into three parts: (1) is Test::Builder so flawed that a rewrite is worthwhile? (2) Will the proposed design address #1? (3) Is the specific "release-candidate branch" implementing #2 good enough to go forward? The consensus answers were more-or-less "yes", "probably" and "not yet".

Participants in the Berlin Consensus discussions

Discussions lasted over 4 days, participants came and went, but each day had about 20-25 people. Thank you to the following participants:

Andreas König, Aristotle Pagaltzis, Barbie, Bulk88, Chad Granum, David Golden, H. Merijn Brand, Helmut Wollmersdorfer, Herbert Breunung, Ingy döt Net, Karen Etheridge, Kenichi Ishigaki, Leon Timmermans, Matthew Horsfall, Neil Bowers, Olivier Mengué, Paul Johnson, Peter Rabbitson, Philippe Bruhat, Ricardo Signes, Salve J. Nilsen, Slaven Rezic, Stefan Seifert, Tatsuhiko Miyagawa, Tina Müller, and Wendy van Dijk

(Apologies to anyone present who was left off the list. Email dagolden at cpan dot org or send a pull request to be added.)

Posted in cpan, perl programming, toolchain | Tagged , , , , , , | Comments closed

Faster ordered hashes for Perl

With some prompting and suggestions from Mario Roy, author of MCE, I've been optimizing Hash::Ordered. With the exception of setting existing/new elements, which got a bit slower due to ensuring keys are strings, not references, most functions got faster. Some, like large hash deletion, are now MUCH faster.

Here are changes in benchmarks from version 0.002 to 0.009:

        $VERSION      0.002      0.009

     Results for ordered hash creation for 10 elements

                    94121/s   101293/s

     Results for ordered hash creation for 100 elements

                    10931/s    11226/s

     Results for ordered hash creation for 1000 elements

                     1022/s     1160/s

     Results for fetching ~10% of 10 elements

                  1417712/s  1844781/s

     Results for fetching ~10% of 100 elements

                   244800/s   285983/s

     Results for fetching ~10% of 1000 elements

                    24871/s    30342/s

     Results for replacing ~10% of 10 elements

                  1353795/s  1378880/s

     Results for replacing ~10% of 100 elements

                   197232/s   192769/s

     Results for replacing ~10% of 1000 elements

                    20364/s    19909/s

     Results for adding 10 elements to empty hash

                   367588/s   341022/s

     Results for adding 100 elements to empty hash

                    66495/s    58519/s

     Results for adding 1000 elements to empty hash

                     7217/s     6497/s

     Results for creating 10 element hash then deleting ~10%

                    95284/s   94598/s

     Results for creating 100 element hash then deleting ~10%

                     6924/s    9242/s

     Results for creating 1000 element hash then deleting ~10%

                      144/s     934/s

     Results for listing pairs of 10 element hash

                   170187/s   178288/s

     Results for listing pairs of 100 element hash

                    18839/s    19537/s

     Results for listing pairs of 1000 element hash

                     1877/s     1959/s

You can see Hash::Ordered benchmarked against other modules in Hash::Ordered::Benchmarks.

If you need an ordered hash, I encourage you to try Hash::Ordered.

Posted in perl programming | Tagged , , , | Comments closed

© 2009-2015 David Golden All Rights Reserved