Please test Path-Tiny-0.081-TRIAL

The latest development releases of Path::Tiny include this whopper in the Changes file:

!!! INCOMPATIBLE CHANGES !!!
The relative() method no longer uses File::Spec's buggy rel2abs method. The new Path::Tiny algorithm should be comparable and passes File::Spec rel2abs test cases, except that it correctly accounts for symlinks. For common use, you are not likely to notice any difference. For uncommon use, this should be an improvement. As a side benefit, this change drops the minimum File::Spec version required, allowing Path::Tiny to be fatpacked if desired.

I sincerely hope that you won't notice the difference – or if you do, it's because Path::Tiny is defending you against a latent symlink bug.

That said, with any change of this magnitude there's a serious risk of breakage. PLEASE, PLEASE, PLEASE, if you use Path::Tiny, I ask that you test your module or application with Path-Tiny-0.081-TRIAL.

Here's how:

$ cpanm --dev Path::Tiny

# or

$ cpan DAGOLDEN/Path-Tiny-0.081-TRIAL.tar.gz

If you have any problems with it, please open a bug report in the Path::Tiny issue tracker.

Posted in perl programming | Tagged , , | Comments closed

My Github dashboard of neglect

Bitrot.

The curse of being a prolific publisher is a long list of once-cherished, now-neglected modules.

Earlier this week, I got a depressing Github notification. The author of a pull request who has politely pestered me for a while to review his PR, added this comment:

1 year has passed

Ouch!

Sadly, after taking time to review the PR, I actually decided it wasn't a great fit and politely (I hope), rejected it. And then I felt even WORSE, because I'd made someone wait around a year for me to say "no".

Much like my weight hitting a local maxima on the scale, goading me to rededicate myself to healthier eating [dear startups, enough with the constant junk food, already!], this PR felt like a low point in my open-source maintenance.

And, so, just like I now have an app to show me a dashboard of my food consumption, I decided I needed a birds eye view of what I'd been ignoring on Github.

Here, paraphrased, is my "conversation" with Github.

Me: Github, show me a dashboard!  

GH: Here's a feed of events on repos you watch

Me: No, I want a dashboard.

GH: Here's a list of issues created, assigned or mentioning you.

Me: No, I want a dashboard.  Maybe I need an organization view.  [my CPAN repos are in an organization]

GH: Here's a feed of events on repos in the organization.

Me: No, I want a dashboard of issues.

GH: Here's a list of issues for repos in the organization.

Me: Uh, can you summarize that?

GH: No.

Me: Github, you suck.  But you have an API.  Time to bust out some Perl.

So I wrote my own github-dashboard program, using Net::GitHub. (Really, I adapted it from other Net::GitHub programs I already use.) I keep my Github user id and API token in my .gitconfig, so the program pulls my credentials from there.

Below, you can see my Github dashboard of neglect (top 40 only!). The three columns of numbers are (respectively) PRs, non-wishlist issues and wishlist issues. (Wishlist items are identified either by label or by "wishlist" in the title.)

$ ./github-dashboard |  head -40
                               Capture-Tiny   3  18   0
                                    Meerkat   2   8   0
                               getopt-lucid   2   1   0
                                  Path-Tiny   1  21   0
                               HTTP-Tiny-UA   1   5   0
                         Path-Iterator-Rule   1   5   0
  Dist-Zilla-Plugin-BumpVersionAfterRelease   1   3   2
                              Metabase-Fact   1   3   0
                dist-zilla-plugin-osprereqs   1   2   0
       Dist-Zilla-Plugin-Test-ReportPrereqs   1   2   0
                                    ToolSet   1   2   0
        Dist-Zilla-Plugin-Meta-Contributors   1   1   0
     Dist-Zilla-Plugin-MakeMaker-Highlander   1   0   0
                         Task-CPAN-Reporter   1   0   0
                           IO-CaptureOutput   0   7   0
                                     pantry   0   7   2
                     TAP-Harness-Restricted   0   4   0
                            class-insideout   0   3   0
                               Hash-Ordered   0   3   0
                                    Log-Any   0   3   4
                                  perl-chef   0   3   0
                                 Term-Title   0   3   0
                               Test-DiagINC   0   3   0
                          Acme-require-case   0   2   0
                                 Class-Tiny   0   2   0
                                  Data-Fake   0   2   2
                  dist-zilla-plugin-twitter   0   2   0
                   Log-Any-Adapter-Log4perl   0   2   0
                             math-random-oo   0   2   0
                                 superclass   0   2   0
                                   Test-Roo   0   2   0
                              universal-new   0   2   0
                           zzz-rt-to-github   0   2   0
                      app-ylastic-costagent   0   1   0
                      Dancer-Session-Cookie   0   1   0
          Dist-Zilla-Plugin-CheckExtraTests   0   1   0
          Dist-Zilla-Plugin-InsertCopyright   0   1   0
Dist-Zilla-Plugin-ReleaseStatus-FromVersion   0   1   0
                                 File-chdir   0   1   0
                                 File-pushd   0   1   0

Now, when I set aside maintenance time, I know where to work.

Posted in perl programming | Tagged , , , | Comments closed

A parallel MongoDB client with Perl and fork

Concurrency is hard, and that's just as true in Perl as it is in most languages. While Perl has threads, they aren't lightweight, so they aren't an obvious answer to parallel processing the way they are elsewhere. In Perl, doing concurrent work generally means (a) a non-blocking/asynchronous framework or (b) forking sub-processes as workers.

There is no officially-supported async MongoDB driver for Perl (yet), so this article is about forking.

The problem with forking a MongoDB client object is that forks don't automatically close sockets. And having two (or more) processes trying to use the same socket is a recipe for corruption.

At one point in the design of the MongoDB Perl driver v1.0.0, I had it cache the PID on creation and then check if it had changed before every operation. If so, the socket to the MongoDB server would be closed and re-opened. It was auto-magic!

The problem with this approach is that it incurs overhead on every operation, regardless of whether forks are in use. Even if forks are used, they are rare compared to the frequency of database operations for any non-trivial program.

So I took out that mis-feature. Now, you must manually call the reconnect method on your client objects after you fork (or spawn a thread, too).

Here's a pattern I've found myself using from time to time to do parallel processing with Parallel::ForkManager, adapted to reconnect the MongoDB client object in each child:

use Parallel::ForkManager;

# Pass in a MongoDB::MongoClient object, the number of parallel jobs to
# run, and a code-reference to execute. The code reference is passed
# the client and the iteration number.
sub parallel_mongodb {
    my ( $client, $jobs, $fcn ) = @_;

    my $pm = Parallel::ForkManager->new( $jobs > 1 ? $jobs : 0 );

    local $SIG{INT} = sub {
        warn "Caught SIGINT; Waiting for child processes\n";
        $pm->wait_all_children;
        exit 1;
    };

    for my $i ( 0 .. $jobs - 1 ) {
        $pm->start and next;
        $SIG{INT} = sub { $pm->finish };
        $client->reconnect;
        $fcn->( $i );
        $pm->finish;
    }

    $pm->wait_all_children;
}

To use this subroutine, I partition the input data into the number of jobs to run. Then I call parallel_mongodb with a closure that can find the input data from the job number:

use MongoDB;

# Partition input data into N parts.  Assume each is a document to insert.
my @data = (
   [ { a => 1 },  {b => 2},  ... ],
   [ { m => 11 }, {n => 12}, ... ],
   ...
);
my $number_of_jobs = @data;

my $client = MongoDB->connect;
my $coll = $client->ns("test.dataset");

parallel_mongodb( $client, $number_of_jobs,
  sub {
    $coll->insert_many( $data[ shift ], { ordered => 0 } );
  }
);

Of course, you want to be careful that the job count (i.e. the partition count) is optimal. I find that having it roughly equal to the number of CPUs tends to work pretty well in practice.

What you don't want to do, however, is to call $pm->start more than the number of child tasks you want running in parallel. You don't want a new process for every data item to process, since each fork also has to reconnect to the database, which is slow. That's why you should figure out the partitioning first, and only spawn a process per partition.

This is best for "embarrassingly parallel" problems, where there's no need for communication back from the child processes. And while what I've shown does a manual partition into arrays, you could also do this with a single array, where child workers only processes indices where the index modulo the number of jobs is equal to the job ID. Or you could have child workers pulling from a common task queue over a network, etc.

TIMTOWTDI, and now you can do it in parallel.

Posted in mongodb, perl programming | Tagged , , , | Comments closed

Perl 5 and Perl 6 are mortal enemies

Did you grow up with one or more siblings? Are you a parent with two or more kids? Then you know that siblings often fight. A lot.

Perl 6 is described as Perl 5's little sister

That metaphor fits. They share parentage. The languages are similar in philosophy. One is more mature, the other less so. Their communities overlap.

But like siblings, they are rivals. Like an only child confronted with a new baby in the house, they now compete for attention from their shared community. They compete for scarce resources to grow – in the form of volunteers who will contribute time and treasure.

Their economic futures are both in doubt

This is what makes them not just rivals, but mortal enemies.

There are many signs that Perl 5 is in decline. Perl 5 is rarely a first language. The number of Perl 5 jobs is – at best – constant, at a time when technology jobs are booming in the wide economy. New applications are rarely written in Perl 5. This year, the Perl 5 community had to beg for talk submissions to OSCON, which grew out of The Perl Conference in the first place.

Is Perl 5 dead? Of course not. But I don't think anyone can cite credible evidence that it's a growth language on par with other "popular" languages. And that's OK. There's still value to be had in a good niche.

But now consider Perl 6. Where will it grow?

First, a postulate: given the language similarities, the people that will find it easiest to learn Perl 6 are today's Perl 5 developers.

Now, let's consider some scenarios:

Scenario 1: Perl 6 takes off!

With its gradual typing and natural async model, Perl 6 becomes the fastest dynamic language. People flock to it from far and wide. It becomes more popular than Rails in the day. YC startups choose it for competitive advantage.

Perl 5 devs, with their advantage in switching, flock to the new economic opportunities it offers. Companies still using Perl 5 find it even harder to find good devs than they do today, or are forced to pay up for them. Even fewer new project are started with Perl 5. The reasons for anyone to learn Perl 5 become fewer. Perl 5 lives on like COBOL, with a handful of older developers well paid to maintain a shrinking legacy code base.

Perl 6 lives and grows; Perl 5 heads quickly down the path to obsolescence.

Scenario 2: Perl 6 stalls out

Perl 6 winds up plagued by ongoing quality glitches and performance problems. Companies that already have Perl 5 developers (and that would have a competitive advantage retraining them) see no benefits from using Perl 6 for new projects.

With no job opportunities, most Perl 5 devs don't pick up Perl 6. The pool of Perl 6 developers stays a fraction of the already small Perl 5 pool. With even Perl 5 companies not adopting Perl 6, no one else is willing to risk Perl 6 adoption for new work, reinforcing the lack of economic opportunity.

Perl 5 stays status quo, static in an industry growing exponentially; Perl 6 remains a hobby language.

Scenario 3: Perl 6 winds up marginally better than Perl 5

Perl 6 turns out to be better than Perl 5, but not so much as to attract developers from other dynamic language communities. Companies that use Perl 5 find it cheaper to retrain their existing developer pool in Perl 6 for performance improvements in new projects. Over time, more projects are in Perl 6 than Perl 5.

Perl 5 devs see the winds of change. Those who don't want to do maintenance work forever pick up Perl 6 to stay relevant.

Perl 6 ekes out a living, stealing increasing production code share from Perl 5. Perl 5 declines moderately faster.

Zero-sum is not necessarily bad

When I say "mortal" enemies, I mean that only one is likely to survive in the long run. I can't think of a scenario where Perl 6 grows and Perl 5 grows. I can't even think of a plausible scenario where Perl 6 grows and Perl 5 is unaffected.

So I think it's zero sum. If Perl 6 grows, then Perl 5 dies faster. If Perl 6 fails to thrive, then Perl 5 keeps the status quo.

Is that bad? I don't think so. The possibility of wild success for Perl 6 should thrill Perl 5 devs, who would have an advantaged position in the new order.

For Perl 5 devs, the best case is great and the worst case seems to be status quo.

So why is there an undercurrent of hostility between the Perl 5 and Perl 6 communities? I think it's because the worst case is actually worse.

Scenario 4: Perl 6 stalls, and drags Perl 5 down with it

Perl 6 winds up plagued by ongoing quality glitches and performance problems. Tainted by association, companies abandon Perl 5 faster as Perl 6's failure makes Perl 5 seem that much more like a dead end. More Perl 5 monolithic apps get re-written as micro-services in trendy languages with easier deployment.

Meanwhile, prolific Perl 5 contributors to p5p and CPAN jump over to Perl 6 to try to help – either betting on Scenario #1 or just trying to save the day. Perl 5 innovation slows, re-raising the "Perl 5 is dead" meme and accelerating economic migration away from Perl 5.

If a tree falls in the forest...

I think this is the fear in the Perl 5 community. If Perl 6 fails, will it do so quietly, allowing the Perl 5 status quo to continue? Or will it suck away resources from Perl 5 and harm Perl 5's already shaky reputation further, hastening the decline?

So I'm not surprised by tension on both sides. I think it's natural.

Just like sibling rivalry.

[Discuss on Reddit...]

Posted in p5p, perl programming, perl6 | Tagged , , , , | Comments closed

Book Review: The Go Programming Language

null

[Disclaimer: I was provided with a free review copy by the publisher.]

tl;dr

If you're looking to buy a comprehensive text on Go, "The Go Programming Language" is an excellent choice. But with so many free e-book introductions to Go, do you really need it? Maybe, but maybe not.

Overview

The authors "assume that you have programmed in one or more other languages" and thus "won't spell out everything as if for a total beginner". Yet the book weighs in at a hefty 380 pages (over 100 pages more than my venerable 1988 K&R 2nd edition).

Is it better than the free 50-page "Little Go Book", or the free 160-page "Introduction to Programming in Go" or even the freely-available 80-page Go Language Specification itself? Yes, certainly. But is it two or three or four times as good? I don't think so.

So is "The Go Programming Language" worth the cost to read in both dollars *and* time? It depends on how you learn, how much you already know, and whether, for you, the good parts outweigh the bad.

The Good Parts

Chapter 1 ("Tutorial") sets the stage for much of what is excellent about this book: fabulous examples. Beyond the obligatory "Hello World", it presents a quick look at several simplified "real world" examples, including command line text filtering, image generation/animation, URL fetching and serving a web page.

The rest of the book follows this same pattern. Chapters typically present several different code examples, most of which do real things rather than just consist of toy code. They include exercises (which I didn't do), that would be good for a course or for someone who learns best by doing structured exercises. The examples are enough to serve as a starting "cookbook" for many real-world tasks.

I also found the explanations of struct embedding and composition to be excellent. Some concepts gelled much better for me than they had from other texts and even from my own coding to date. I had the same experience in the chapter on concurrency with channels. I was pleased that things so idiosyncratic to Go were some of the best parts of the book.

The Bad Parts

Sadly, the book's coverage of the standard library is haphazard. On the one hand, the many real world code examples gave opportunities to introduce parts of the standard library naturally throughout. Unfortunately, that also means there's no comprehensive coverage of the standard library itself, which is surprising given that it's one of great strengths of the language.

The most glaring example of this ad hoc approach was finding a section on text and HTML templating oddly dropped in at the end of Chapter 4 ("Composite Types"). It was as if they really wanted to cover those packages and -- without a chapter dedicated to the standard library -- had nowhere else to put it.

As mentioned previously, the book is long and rather dense. It's not a quick read. Worse, the authors have a habit of burying important points or cautions in the middle of a wall of text and code examples. The lack of cutesy caution icons or call-out boxes for these tidbits (as would be found in more informally-styled books) really hurts skim-ability.

The Mixed Parts

As great as the examples were, I found some aspects disturbing. First, in some cases, implementation details were omitted from the text -- the reader is expected to download the source to see the full example. I would have preferred complete, if less ambitious, examples instead.

In other cases -- particularly in the sections on concurrency -- the examples are presented in a progression of one or more complete "wrong" examples of how not to do things before an example of the "right" way to do things. This approach is a good teaching method, but it adds substantially to the length of the text -- you have to grok a lot more code to parse out the differences between the examples.

The other interesting observation I had was that in many cases, the examples omit error handling for brevity. Since the verbosity of Go code error handling is a frequent criticism of the language, omitting it seemed somehow disingenuous.

Summary

If you have the money and patience and you like deep dives and real, working examples and exercises, this book is an excellent choice. If you prefer to skim or dabble, or just want a handy reference text, there are probably better options.

Posted in books | Tagged , | Comments closed

© 2009-2016 David Golden All Rights Reserved