The xdg channel — Thanksgiving missive

While I haven't been blogging much, I have been busy coding. To riff from Damian's "Conway Channel" talks, this blog post summarizes the various (mostly new) CPAN modules I've been working on.

::Tiny and not so ::Tiny

I appear to be one of the leading proponents of "::Tiny" modules. I love the Unix-like small-tools philosophy. Sometimes, though, they can be too tiny, and need extension for situations that need extra features and/or can handle more dependencies.

  • Class::Tiny is my response to the excessive minimalism of Object::Tiny. When you just need read-write accessors with lazy defaults and maybe BUILD/DEMOLISH, Class::Tiny gives it to you in about 120 lines of code.
  • HTTP::Tiny::UA extends HTTP::Tiny. HTTP::Tiny is in the Perl core and Christian and I consider it nearly feature-complete. I hope HTTP::Tiny::UA can become common ground for user-agent extensions that are consistent with the HTTP::Tiny philosophy and use HTTP::Tiny as the underlying transport.
  • Path::Tiny is not new, but it gets steady improvements. Lately, I've been sorting out Windows and volumes. One of these days, I hope to get around to tackling some big changes to file moving, copying and renaming (maybe by the QA hackathon next year).

Embellishing the Moose

Roles are one of the best features of Moose and Moo. I wrote two roles I thought worth sharing.

  • MooseX::Role::Logger provides a Log::Any-based logger. I think Log::Any is a great idea and underappreciated. I've taken over maintenance and hope to someday soon ship a new release that is even more flexible than it is today.
  • MooseX::Role::MongoDB provides an API for using MongoDB::MongoClient and associated databases/collections. It provides lazy-instantiation, caching and fork-safety.

A MongoDB Framework

You either love MongoDB or you hate it. Or both at the same time. MongoDB's document-centric data model is different than you're used to and everything I found on CPAN was too complex or was doing it wrong.

  • Meerkat is a framework that uses Moose objects as projections of the document state maintained in the database. I think it makes it easy use the right conceptual model in a Perl-ish way. Of course, it uses MooseX::Role::MongoDB under the hood.

Living with failure

Perl's poor excuse for an exception system is painful, so it falls to CPAN to provide improvements. Here are my latest two attempts to provide better tools.

  • failures makes creating and using exception classes extremely easy. Other than relying on Class::Tiny, it's implemented in about 70 lines of code.
  • Try::Tiny::Retry extends Try::Tiny to make it easy to retry a code block on error. It defaults to exponential-backoff, but is easily customizable.

CPAN minus archive equals index

Without an index, CPAN is just a distributed file store.

  • CPAN::Common::Index is a common library for accessing several types of CPAN indexes. I hope someday it will be something that CPAN clients will use.

Roooaarrrrrhhhh

If I didn't use Dist::Zilla, I couldn't possibly be as prolific as I am. So some fraction of my time is spent adding to the the Dist::Zilla ecosystem. In addition to helping make Dist::Zilla safe for encodings, I churned out a few new plugins.

Pod::Spell gets used by my Dist::Zilla spell checking plugins. I merged in the word list from Pod::Wordlist::Hanekomu, improved wordlist matching with Lingua::EN::Inflect and made some other algorithm improvements.

More for the core

I kept pushing some core modules forward in various ways, mostly just applying patches or fixing bugs.

  • HTTP::Tiny got some minor bug fixes
  • File::Temp got some dependency management and Travis CI smoking
  • CPAN::Meta got some fixes to validation and a couple new features

YAML::Tiny isn't really core, but it is the basis for CPAN::Meta::YAML, so I count it in the same category. Working with Ingy, Karen Etheridge and Jim Keenan, we fixed encoding, overhauled the test suite and added test coverage.

Code review

Inspired by rjbs's code-review practices, I've started gradually cleaning up and re-releasing old distributions of mine.

I for Incomplete

There are a number of other projects that I've started or just conceived that I haven't finished. They may yet see the light of day in the future.

  • A "tiny" URI module
  • A better benchmarking library, with statistical rigor for non-parametric timing distributions with unequal variance
  • Some extensions for Data::Faker
  • A module providing a standard way to safely evaluate $VERSION lines parsed from modules

What you can do

First, if any of these are interesting to you, please try them out and let me know what you think.

Second, if you're not in the habit of releasing code to CPAN, consider starting. When you write some library, take an extra second or two to think about how it could be generalized for others and ship it.

Give thanks for CPAN by giving back.

This entry was posted in perl programming and tagged , , . Bookmark the permalink. Both comments and trackbacks are currently closed.

6 Comments

  1. Posted November 30, 2013 at 9:13 pm | Permalink

    I am totally excited about an alternate URI implementation! For example, how about requiring less mutation? URI::Tiny->new('http://foo.com', { params => { foo => 1, bar => 2 } })->as_string Just an idea. Stoked for when you get to it :)

    • Posted November 30, 2013 at 11:04 pm | Permalink

      I, too, want the objects to be immutable. I was thinking more along the lines of URI::Tiny->parse("http://foo.com")->clone( params => { foo => 1, bar => 2 } )->as_string, as I expect that usually the literal part will be a variable anyway: $common_uri->clone( ... ).

    • Michael J
      Posted December 3, 2013 at 4:02 am | Permalink

      I'd just started thinking about making a simpler URI module too, particularly with regard to easier comparison of URLs. It was a failing test when trying to install Facebook::Graph recently that prompted it - the random order of query arguments was breaking a comparison, and I couldn't see a simple way of fixing it with URI::QueryParam (it's easy to implement a fix using it, but needs to be simple, one-liner for repeated use in test scripts).

      So, it would be nice if a URI::Tiny had a simple way of comparing two URLs, maybe with an option for exact match, or equivalent match (i.e. where order of arguments is irrelevant)

  2. Jakub Narebski
    Posted December 2, 2013 at 6:01 am | Permalink

    A better benchmarking library

    What about Dumbbench module?

    • Posted December 2, 2013 at 2:01 pm | Permalink

      Dumbbench is a step in the right direction, but I believe it mis-applies the central limit theorem and is therefore not applicable when the distribution of run times is not normally distributed. Dumbbench asserts the timing distribution is normal due to CLT, but CLT only says that that sample means are normally distributed, not the underlying distribution itself, so assuming normality as a way of truncating outliers is dubious. I've seen dumbbench fail to converge in some circumstances because of this.

      Moreover, it doesn't give you any reliable way to reject the hypothesis that two different algorithms have the same timing to a desired level of confidence, which is really the point of benchmarking in the first place.

      My idea is to implement a two-sample K-S test to determine (with relatively few samples, no less) whether two benchmark sample distributions are drawn from the same distribution or not.

      • Jakub Narebski
        Posted December 2, 2013 at 5:53 pm | Permalink

        There are a few issues that needs to be solved in such benchmarking library.

        First is warming up cache and substracting dry run result ("nop" benchmark) i.e. benchmarking overhead.

        Second is dealing with outliers. Dumbbench uses median or mean as location estimate and MAD (median-absolute deviation) as robust scale estimate, and uses it for outlier detection and removal. Another simple scale estimate is IQR (inter-quartile range); Wikipedia mentions also other robust measures of location and scale, but they get more complicated.

        Third is either reaching required precision (as Dumbbench tries to do), or accepting or rejecting hypothesis that two benchmark results are the same (what you want to do) - though one would probably want speed ratios if hypothesis is rejected.

        BTW. perhaps stealing algorithms and ideas from Java (e.g. Caliper) would be good idea - it needs robust statistics because of JVM jitter ;-)

© 2009-2014 David Golden All Rights Reserved