I've recently seen or heard a lot of discussion about how to make Perl and CPAN even better, but I feel strongly that we need a more a formal definition of packages, modules and distributions so that there is common language and understanding for the current semantics. Only then, do I think we can have meaningful discussions of potential changes.
With that in mind, here is my best understanding of the as-is state of Perl 5.
- A package is a Perl namespace as specified using the package keyword. Packages may have (but are not required to have) a version number. The version number of a package is the value in the package variable $VERSION, which is set during runtime. $VERSION should not be altered once set. I will refer to a well-formed package as one which provides a $VERSION.
- A module is a Perl file with a ".pm" extension. A module's name is a Perl namespace that maps to a relative file-path to the module by replacing namespace separators (:: and ') with path separators and appending ".pm". A module contains zero or more packages. Compiling the module should return a true value. The module must return a true value when loaded via use() or require().1 A module's version number is that which is parsed and returned from the file by MM->parse_version() as provided by ExtUtils::MakeMaker. I will refer to a well-formed module as one which contains a well-formed package with the same name as the module name and the same version as the module version.
- A distribution is an archive file containing zero or more modules. A distribution file is uniquely identified by a file-path of the form AUTHOR/NAME-VERSION.SUFFIX (i.e. as exists on CPAN). A distribution's name and version are parsed from the basename of the archive file.2 I will refer to a well-formed distribution as one meeting the following criteria: (a) it contains a well-formed module, M; (b) replacing the namespace separators in module M's name with dashes gives the distribution name; and (c) the distribution version is equal to module M's version.
I plan to use these definitions going forward as I discuss the evolution of CPAN, so I would welcome any feedback on whether these definitions seem consistent with how Perl 5 and CPAN work today and whether the "well-formed" designations are clear and appropriate.
7 Comments
It's good to define these terms too, because they are so overloaded. For example, to install a module, I might use the package manager from my distribution
$ sudo apt-get install libmodule-build-perlThat's not CPAN, but it's close enough in context to add confusion, if we're not careful.
That's a great example!
I agree with all of the above, but have nothing substantial to add. Well said.
I'd also mention that some dist's on CPAN may have tools but not modules, or at least the modules may not be for public consumption but only for the dist's executable files.
That's a good point. I did think to say 'zero or more' modules, but it's good to be explicit about the other uses for distributions.
Really excellent post. Thanks, David. This information belongs in the Perl 5 wiki if it's not there already.
Here's a similar blog post -- but for Python -- which you might find interesting: http://blog.ianbicking.org/2008/12/14/a-few-corrections-to-on-packaging/
Also, does Perl 6 use the same terms (above) as Perl 5?
These are all good definitions of the terms although I have one point that may appear to be minor at first but may make a difference if these definitions are to be used when considering tool chain modifications.
In your module definition you mention that "Compiling the module should return a true value." I think that what you're getting at is that "use"ing the module should not fail with a "foo.pm did not return a true value" exception. However, one could interpret this to mean that "perl -c foo.pm" returns true. perl -c will return true if your file doesn't return true and doesn't contain any compilation errors. Clarifying this point can potentially avoid some confusion.
One Trackback
[...] on CPAN also have version numbers. These are specified as part of the filename. (See a prior article for a formal definition of modules and [...]