Packages, modules and distributions

I've recently seen or heard a lot of discussion about how to make Perl and CPAN even better, but I feel strongly that we need a more a formal definition of packages, modules and distributions so that there is common language and understanding for the current semantics.  Only then, do I think we can have meaningful discussions of potential changes.

With that in mind, here is my best understanding of the as-is state of Perl 5.

  • A package is a Perl namespace as specified using the package keyword.  Packages may have (but are not required to have) a version number. The version number  of a package is the value in the package variable $VERSION, which is set during runtime.  $VERSION should not be altered once set.  I will refer to a well-formed package as one which provides a $VERSION.
  • A module is a Perl file with a ".pm" extension.  A module's name is a Perl namespace that maps to a relative file-path to the module by replacing namespace separators (:: and ') with path separators and appending ".pm".  A module contains zero or more packages.  Compiling the module should return a true value. The module must return a true value when loaded via use() or require(). (([Thank you, Ben])) A module's version number is that which is parsed and returned from the file by MM->parse_version() as provided by ExtUtils::MakeMaker.  I will refer to a well-formed module as one which contains a well-formed package with the same name as the module name and the same version as the module version.
  • A distribution is an archive file containing zero or more modules.  A distribution file is uniquely identified by a file-path of the form AUTHOR/NAME-VERSION.SUFFIX (i.e. as exists on CPAN).  A distribution's name and version are parsed from the basename of the archive file. ((This requires substantial heuristics.  See CPAN::DistnameInfo for a relatively canonical approach.))   I will refer to a well-formed distribution as one meeting the following criteria: (a) it contains a well-formed module, M; (b) replacing the namespace separators in module M's name with dashes gives the distribution name; and (c) the distribution version is equal to module M's version.

I plan to use these definitions going forward as I discuss the evolution of CPAN, so I would welcome any feedback on whether these definitions seem consistent with how Perl 5 and CPAN work today and whether the "well-formed" designations  are clear and appropriate.

This entry was posted in cpan, perl programming and tagged . Bookmark the permalink. Both comments and trackbacks are currently closed.

7 Comments

  1. Posted July 20, 2009 at 3:30 am | Permalink

    It's good to define these terms too, because they are so overloaded. For example, to install a module, I might use the package manager from my distribution

    $ sudo apt-get install libmodule-build-perl

    That's not CPAN, but it's close enough in context to add confusion, if we're not careful.

    • david
      Posted July 20, 2009 at 4:10 am | Permalink

      That's a great example!

  2. Posted July 20, 2009 at 6:41 am | Permalink

    I agree with all of the above, but have nothing substantial to add. Well said.

  3. Posted July 20, 2009 at 4:41 pm | Permalink

    I'd also mention that some dist's on CPAN may have tools but not modules, or at least the modules may not be for public consumption but only for the dist's executable files.

    • david
      Posted July 20, 2009 at 6:45 pm | Permalink

      That's a good point. I did think to say 'zero or more' modules, but it's good to be explicit about the other uses for distributions.

  4. John
    Posted July 22, 2009 at 7:59 am | Permalink

    Really excellent post. Thanks, David. This information belongs in the Perl 5 wiki if it's not there already.

    Here's a similar blog post -- but for Python -- which you might find interesting: http://blog.ianbicking.org/2008/12/14/a-few-corrections-to-on-packaging/

    Also, does Perl 6 use the same terms (above) as Perl 5?

  5. BenRifkah
    Posted July 22, 2009 at 12:02 pm | Permalink

    These are all good definitions of the terms although I have one point that may appear to be minor at first but may make a difference if these definitions are to be used when considering tool chain modifications.

    In your module definition you mention that "Compiling the module should return a true value." I think that what you're getting at is that "use"ing the module should not fail with a "foo.pm did not return a true value" exception. However, one could interpret this to mean that "perl -c foo.pm" returns true. perl -c will return true if your file doesn't return true and doesn't contain any compilation errors. Clarifying this point can potentially avoid some confusion.

One Trackback

  • By Version numbers should be boring on August 5, 2009 at 6:35 pm

    [...] on CPAN also have version numbers. These are specified as part of the filename. (See a prior article for a formal definition of modules and [...]

© 2009-2014 David Golden All Rights Reserved