Dist::Zilla ♥ encoding

Last weekend I went to a Dist::Zilla "micro-hackathon" at Ricardo Signes' house. This is something we've done before, setting aside some focused time to tackle something tough. This time, we set out to fix the leaky encoding handling in Dist::Zilla and Ricardo explains the result well in his blog.

But what about Plugins?

I'm really happy that we finally fixed Dist::Zilla's handling of encoding, but unfortunately, Dist::Zilla is only as good as the ecosystem of plugins for it on CPAN. OK, really, it's good on its own but it's even better because of its ecosystem.

As Rik said, lots of things will just work better than before, particularly if you stick to UTF-8 for your files. The plugins most directly affected are FileGatherers and FileInjectors — anything that calls add_file — in particular anything that reads a file off disk without a particular encoding and then uses that as the content of a Dist::Zilla::File::* object.

For example, if you wrote a plugin that reads a template file, runs it through some other libraries, and stuffs the result into the content of a File object, then you really ought to sit down and think about whether you ought to be reading :raw or with some decoding layer. (You should probably shouldn't read with the default layers, which might do CRLF translation on Windows.) Whatever you decide, I recommend Path::Tiny and methods it offers like slurp_raw and slurp_utf8.

FileMungers are probably fine. Typically, they read content — which is now decoded text — then do something with it, and then stuff it back into content. Munging is text in and text out, and Dist::Zilla will take care of encoding it before writing it to disk.

Come out, come out, wherever you are…

Using grep.cpan.me, I tried to review all the FileInjectors and FileGatherers I could. Most of them look like they'll be unaffected by the changes in Dist::Zilla. But some looked suspicious and I'll give a list of them below.

If you wrote one of these, please review it for any changes you need to make for Dist::Zilla version 5 and release a -TRIAL version to CPAN. Follow the instructions at the bottom of Rik's post to get the -TRIAL version of Dist::Zilla and its dependencies.

If you use any of these, try them out with Dist::Zilla version 5 and see if it breaks anything for you. If so, let the author know right away.

To be clear, I'm not sure these need work, but they're doing things that concern me.

  • Dist::Zilla::Plugin::AssertOS (InMemory, raw slurp)
  • Dist::Zilla::Plugin::Author::Plicease::Init2 (mix of methods, including raw slurp)
  • Dist::Zilla::Plugin::CSS::Compressor (FromCode, raw slurp through CSS compressor)
  • Dist::Zilla::Plugin::Doppelgaenger (InMemory, raw slurp)
  • Dist::Zilla::Plugin::JSAN (InMemory, raw slurp, munged through other libraries)
  • Dist::Zilla::Plugin::LocaleTextDomain (FromCode, encoded content)
  • Dist::Zilla::Plugin::ManifestInRoot (FromCode, filename list)
  • Dist::Zilla::Plugin::Moz (InMemory, raw JAR content)
  • Dist::Zilla::Plugin::ShareDir::Tarball (InMemory, compressed tarball content)
  • Dist::Zilla::Plugin::TravisYML (InMemory, raw slurp)
  • Dist::Zilla::Plugin::TwitterBootstrap (InMemory, zip file member contents)
  • Dist::Zilla::Plugin::jQuery (InMemory, raw slurp)
  • Dist::Zilla::Role::ModuleIncluder (InMemory, raw slurp)
  • If you wrote or use a FileGatherer or FileInjector that is not on the list, that doesn't necessarily mean you're safe. It just means that a quick skim of your code didn't throw up any red flags.

    Micro-hackathon for the win

    If you've got some project that you're stuck on, I encourage you to grab a friend, set aside a day or two, and see if a micro-hackathon like ours can get you unstuck.

    In addition to getting Dist::Zilla fixed, I had a lot of fun. Whenever I get to sit down and work with Rik, I learn something new. This time, I feel like a lot of work I've been doing around encoding in the last year all came together in my head and made sense.

    Beyond that, I picked up a few editor, Moose and Mac tricks; got to visit most of Rik's favorite Bethlehem dives; tried saffron-almond ice cream; and learned several new board and card games. Woo hoo!

    Thank you to Rik (and his family) for a great weekend!

    Posted in dzil, perl programming | Tagged , , , | Comments closed

    Only the bravest CPAN warriors need apply

    Are you a CPAN warrior? Are you up for a challenge? Read on...

    I have finally fixed the CPAN.pm branch that implements support for recommends and suggests prerequisites. It seems to work, but CPAN.pm internals are so hairy that I wouldn't be surprised if there are still subtle bugs.

    I don't think it will melt your system, but it needs some real-world test driving.

    Here's how you can help:

    1. Install an up-to-date CPAN::Meta from CPAN
    2. Download the tarball for my 'fix-retry-recommend-support' branch of CPAN.pm
    3. Untar it
    4. Install it (inside the directory: make touchtestdistros && make test && make install)
    5. Fire up a cpan shell and turn on the new policies: o conf recommends_policy 1 and maybe o conf suggests_policy 1 and then o conf commit
    6. Install some modules that have recommends/suggests prerequisites

    The best thing would be to start using this as your regular CPAN client for a while. (No, it's not as quick and terse as cpanminus, but you're a brave CPAN warrior and won't let a little verbosity stop you, right?)

    If you find bugs, file them on the pull-request thread.

    With enough help, I hope to get this tested, merged, and shipped in time for Perl 5.20.

    Thank you!

    Posted in cpan, perl programming, toolchain | Tagged , , , , | Comments closed

    Five percent of indexed CPAN packages come from just two distributions

    And both have to do with advertising.

    Counting all packages indexed in 02packages.details.txt (possibly across multiple tarball versions), two distributions account for 5% of CPAN packages:

    • GOOGLE-ADWORDS-PERL-CLIENT
    • Microsoft-AdCenter

    Here are the top 10 with the number of index lines each:

    GOOGLE-ADWORDS-PERL-CLIENT: 5480
    Microsoft-AdCenter: 1628
    Shipment: 1197
    eBay-API: 1160
    BioPerl: 851
    Graphics-VTK: 707
    DateTime-Locale: 469
    Net-Amazon: 455
    UMMF: 451
    Locales: 426
    

    Together they account for 8.6% of the 135,000 or so currently indexed packages.

    Posted in perl programming | Tagged , , , | Comments closed

    Real $VERSIONs on CPAN

    I've been looking at patterns of $VERSION definition in the wild. And, wow. There's some crazy stuff out there.

    Remember that the way PAUSE (and other tools) parse a $VERSION definition line is not by loading the module, but by extracting it as a standalone line and essentially running it through eval. PAUSE says this must work:

    perl -MExtUtils::MakeMaker -le 'print MM->parse_version(shift)' path/to/file.pm
    

    PAUSE actually does something more sophisticated and safe, but if parse_version works, so will PAUSE.

    Keeping that in mind, listed below are some of the more surprising/amusing/horrifying things I've seen. Not all are broken, but even the ones that work are, well, unusual.

    I'm not naming any names — and I'll obscure them if it would be too revealing — but these are all from real .pm files in tarballs currently indexed on CPAN.

    $VERSION = '';
    
    $VERSION = '0.O1';
    
    $VERSION = "0.01c";
    
    $VERSION = VERSION;
    
    $VERSION = '0.10E0';
    
    our $VERSION = -722;
    
    my $VERSION = '0.01';
    
    $VERSION = 0xdeadbeef;
    
    our $VERSION = '1.0-1';
    
    $VERSION = 2006_08_16.1;
    
    our $VERSION = '0.8.1-2';
    
    $VERSION = $VERSION = "0.1";
    
    $Foo::Bar::VERSION |= '2.6';
    
    my $VERSION = 'OMG-04-05-01';
    
    our $VERSION = '1.4.A8UG1gG';
    
    our $VERSION = 'set-when-loading';
    
    our $VERSION = '$Revision: 1.2 $';
    
    our $VERSION = $Foo::Bar::VERISON;
    
    our $VERSION=$Foo::VERSION; use Foo;
    
    local $Other::Module::VERSION = 666;
    
    $Foo::Bar::VERSION="1.23" unless $Foo::Bar::VERSION;
    
    $Foo::Bar::VERSION='1.00' unless $INC{'Foo/Bar0.pm'};
    
    $VERSION=eval 'use version;1' ? 'version'->new('0.33') : '0.33';
    
    

    Some of these are so nuts that unless they are modified by a subsequent line into a standard version format, then a version check will throw an error on any recent perl:

    # Foo.pm
    package Foo;
    our $VERSION = "1.2-trailing-junk";
    
    $ perl -I. -we 'use Foo 0;'
    Invalid version format (non-numeric data) at -e line 1.
    BEGIN failed--compilation aborted at -e line 1.
    

    Moral of this story: make sure your $VERSION definition parses cleanly on a line by itself and make sure it's a valid version number. If you're not sure, check it with the is_lax function from version.pm.

    Posted in perl programming | Tagged , , | Comments closed

    Previewing POD before shipping

    I always find typos after I ship to CPAN.

    There is something about reading my docs in nicely-formatted HTML on a web site that makes my mistakes jump out at me. And then I feel stupid for not having caught it earlier.

    I used to use the search.cpan.org pod renderer to preview my POD, but I stopped after a while, probably when I stopped using search.cpan.org for everything else as well.

    What I really wanted was something local, quick and pretty. So I ripped off a bit of what MetaCPAN was doing to render POD and came up with my own utility program: podpreview.

    $ podpreview path/to/whatever.pm
    

    That renders the POD from the file into HTML with an embedded stylesheet, saves it in a temporary file and opens up that file in my browser. There I can happily proofread and find my typos before shipping.

    If you'd like to try it out and adapt it to your own style, here it is:

    #!/usr/bin/env perl
    use v5.10;
    use strict;
    use warnings;
    use Browser::Open qw/open_browser/;
    use Path::Tiny;
    use Pod::Simple::XHTML;
    
    my $file = shift @ARGV
      or die "Usage: $0 <file>";
    $file = path($file);
    
    my $psx = Pod::Simple::XHTML->new;
    $psx->output_string( \my $html );
    $psx->html_charset('UTF-8');
    $psx->html_encode_chars('&<>">');
    $psx->perldoc_url_prefix("https://metacpan.org/module/");
    $psx->html_header( my_header() );
    $psx->html_footer( my_footer() );
    $psx->parse_string_document( $file->slurp_utf8 );
    
    my $temp = path( $ENV{TMPDIR}, 'podpreview', $file->relative . '.html' );
    $temp->touchpath;
    $temp->spew_utf8($html);
    open_browser("file:///$temp");
    
    sub my_css {
        return <<'CSS';
    body { background: snow; font-family: sans-serif; }
    div#main { width: 70%; margin: 5% auto; }
    h1 { font-size: 1.5em; margin: .83em 0 }
    h2 { font-size: 1.17em; margin: 1em 0 }
    h3 { margin: 1.33em 0 }
    h4 { font-size: .83em; line-height: 1.17em; margin: 1.67em 0 }
    h5 { font-size: .67em; margin: 2.33em 0 }
    h1, h2, h3, h4, h5 { font-weight: bolder; color: #36c }
    a:link { color: #36c }
    code { font-size: 1.2em }
    CSS
    }
    
    sub my_header {
        my $css = my_css();
        return <<"HEADER";
    <html>
    <head>
    <title></title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <style>
    $css
    </style>
    </head>
    <body>
    <div id="main">
    HEADER
    }
    
    sub my_footer {
        return <<'FOOTER';
    </div>
    </body>
    </head>
    FOOTER
    }
    
    Posted in perl programming | Tagged , , | Comments closed

    © 2009-2015 David Golden All Rights Reserved