When RFCs attack: HTTP::Tiny is getting stricter

The problem with standards are that there are too many standards. When RFC-2616 – defining HTTP/1.1 – was updated, the IETF spread the details across six RFCs: RFC-7230 to RFC-7235. Most of the changes appear to be server side, but in some areas, particularly around header formatting, the rules are getting tighter.

I was lucky to have a hackathon day at the DC-Baltimore Perl Workshop, so I implemented some of the stricter rules as well as fixing a number of other bugs (and even adding a few features!) See the HTTP::Tiny Changes file for details.

I still need to review more of the RFCs to see if there are other changes that impact HTTP::Tiny.

Meanwhile, HTTP-Tiny-0.057-TRIAL.tar.gz is on CPAN. Please, please, please install it and use it in your day-to-day work to see if there are any problems.

Here is how you can install it with cpanm or cpan:

$ cpanm --dev HTTP::Tiny
$ cpan DAGOLDEN/HTTP-Tiny-0.057-TRIAL.tar.gz

If you find any problems, please open a ticket on the HTTP::Tiny Github issue tracker.


Posted in perl programming | Tagged , , | Comments closed

At-sign nick completion for Weechat and Slack

I like connecting to chat apps like Slack and Flowdock using an IRC client, as that keeps all my chats in one window, right in my terminal.

I've been frustrated that the IRC gateways for Slack and Flowdock don't highlight people based just on their nick the way a typical IRC client does. For example, my IRC nick is "xdg". In an IRC channel, just saying "xdg: ping" will highlight and beep my terminal. But if I want to highlight "bob" who uses the Slack app, I need to say "@bob: ping".

Usually, when I highlight someone, I can use tab completion to finish the nick. That's really handy when I'm talking to my friend "BinGOs". But if I want to @-highlight him and I type "@Bi<TAB>", completion fails.

Since I use Weechat as my IRC client, I whipped up a plugin called atcomplete, which adds an @-prefixed nick for every nick in the nicklist.

With that, I can do "Bi<TAB>" and get "BinGOs", or "<TAB>" again to get "@BinGOs". And if I do "@Bi<TAB>", I get "@BinGOs" right away.

I've submitted it to the Weechat scripts page and it might eventually get approved, but it's available on github now as weechat-atcomplete if people want to try it out. Feedback and patches welcome!

Posted in hacks | Tagged , , , | Comments closed

No more dirty reads with MongoDB

If you're reading this blog, it's a good bet that sometime in your life you've had a computer freeze or crash on you. You know that crashes happen.

If it's your laptop, you restart and hope for the best. When it's your database, things are a bit more complicated.

Historically, a database lived on a single machine. Writes are considered "committed" when they are written to a journal file and flushed to disk. Until then, they are "dirty". If the database crashes, only the committed changes are recovered from disk.

So far, so good. While originally MongoDB didn't journal by default, it has been the default since version 2.0 in 2011.

As you can imagine, the problem with having a database on a single machine is that when the system is down, you can't use the database.

Trading consistency for availability

While MongoDB can run as a single server, it was designed as a distributed database for redundancy. If one server fails, there are others that can continue.

But once you have more than server, you have a distributed system, which means you need to consider consistency between parts of the system. (See Eight Fallacies of Distributed Computing).

You might have heard of the CAP Theorem. While there are excellent critques of its practical applicability – e.g. Thinking More Clearly About Consistency – it is sufficient to convey two key ideas:

  • because networks are unreliable, partitions happen, so for any real system, partition tolerance ("P") is a given.
  • because MongoDB uses redundant servers that continue to operate during a failure or partition, it gives up global consistency ("C") for availability ("A")

When we say that it gives up consistency, what we mean is that there is no single, up-to-date copy of the data visible across the entire distributed system.

Rethinking commmitment

A MongoDB replica set is structured with a single "primary" server that receives writes from clients and streams them to "secondary" servers. If a primary server fails or is partitioned from other servers, a secondary is promoted to primary and takes over servicing writes.

But what does it mean for a write to be "committed" in this model?

Consider the case where a write is committed to disk on the primary, but the primary fails before the write has replicated to a secondary. When a new secondary takes over as primary, it never saw the write, so that write is lost. Even if the old primary rejoins the replica set (now as a secondary), it synchronizes with the current primary, discarding the old write.

This means that commitment to disk on a primary doesn't matter for whether writes survives a failure. In MongoDB, writes are only truly commmitted when they have been replicated to a majority of servers. We can call this "majority-committed" (hereafter abbreviated "m-committed").

Commitment, latency and dirty reads

When replica sets were introduced in version 1.6, MongoDB offered a configuration option called write concern (abbreviated "w"), which controlled how long the server would wait before acknowledging a write. The option specifies the number of servers that need to have the write before the primary would unblock the client – e.g. w=1 (the default), w=2, etc. You could also specify to wait for a majority without knowing the exact number of servers (w='majority').

use MongoDB;
my $mc = MongoDB->connect(
        w => 'majority'

This means that a writing process can control how much latency it experiences waiting for levels of commitment. Set w=1 and the primary acknowledges the write when it is locally committed while replication continues in the background. Set w='majority' and you'll wait until the data is m-committed and thus safe from rollback in a failover.

You may spot a subtle problem in this approach.

A write concern only blocks the writing process! Reading processes can read the write from the primary even before it is m-committed. These are "dirty reads" – reads that might be rolled back if a failure occurs. Certainly, this is a rare case (compared to, say, transaction rollbacks), but it is a dirty read nonetheless.

Avoiding dirty reads

MongoDB 3.2 introduced a new configuration option called read concern, to express the commitment level desired by readers rather than writers.

Read concern is expressed as one of two values:

  • 'local' (the default), meaning the server should return the latest, locally committed data
  • 'majority', meaning the server should return the latest m-committed data

Using read concern requires v1.2.0 or later of the MongoDB Perl driver:

use MongoDB v1.2.0;
my $mc = MongoDB->connect(
        read_concern_level => 'majority'

With read concern, reading processes can finally avoid dirty reads.

However, this introduces yet another subtle problem. Consider what could happen if a process writes with w=1 and then immediately reads with read concern 'majority'?

Configured this way, a process might not read its own writes!

Recommendations for tunable consistency

I encourage MongoDB users to place themselves (or at least, their application activities) into one of the following groups:

  • "I want low latency" – Dirty reads are OK as long as things are fast. Use w=1 and read concern 'local'. (These are the default settings.)
  • "I want consistency" – Dirty reads are not OK, even at the cost of latency or slightly out of date data. Use w='majority' and read concern 'majority.
use MongoDB v1.2.0;
my $mc = MongoDB->connect(
        read_concern_level => 'majority',
        w => 'majority',

I haven't discussed yet another configuration option: read preference. A read preference indicates whether to read from the primary or from a secondary. The problem with reading from secondaries is that, by definition, they lag the primary. Worse, they lag the primary by different amounts, so reading from different secondaries over time pretty much guarantees inconsistent views of your data.

My opinion is that – unless you are a distributed systems expert – you should leave the default read preference alone and read only from the primary.

Posted in mongodb | Tagged , , , | Comments closed

Please test Path-Tiny-0.081-TRIAL

The latest development releases of Path::Tiny include this whopper in the Changes file:

The relative() method no longer uses File::Spec's buggy rel2abs method. The new Path::Tiny algorithm should be comparable and passes File::Spec rel2abs test cases, except that it correctly accounts for symlinks. For common use, you are not likely to notice any difference. For uncommon use, this should be an improvement. As a side benefit, this change drops the minimum File::Spec version required, allowing Path::Tiny to be fatpacked if desired.

I sincerely hope that you won't notice the difference – or if you do, it's because Path::Tiny is defending you against a latent symlink bug.

That said, with any change of this magnitude there's a serious risk of breakage. PLEASE, PLEASE, PLEASE, if you use Path::Tiny, I ask that you test your module or application with Path-Tiny-0.081-TRIAL.

Here's how:

$ cpanm --dev Path::Tiny

# or

$ cpan DAGOLDEN/Path-Tiny-0.081-TRIAL.tar.gz

If you have any problems with it, please open a bug report in the Path::Tiny issue tracker.

Posted in perl programming | Tagged , , | Comments closed

My Github dashboard of neglect


The curse of being a prolific publisher is a long list of once-cherished, now-neglected modules.

Earlier this week, I got a depressing Github notification. The author of a pull request who has politely pestered me for a while to review his PR, added this comment:

1 year has passed


Sadly, after taking time to review the PR, I actually decided it wasn't a great fit and politely (I hope), rejected it. And then I felt even WORSE, because I'd made someone wait around a year for me to say "no".

Much like my weight hitting a local maxima on the scale, goading me to rededicate myself to healthier eating [dear startups, enough with the constant junk food, already!], this PR felt like a low point in my open-source maintenance.

And, so, just like I now have an app to show me a dashboard of my food consumption, I decided I needed a birds eye view of what I'd been ignoring on Github.

Here, paraphrased, is my "conversation" with Github.

Me: Github, show me a dashboard!  

GH: Here's a feed of events on repos you watch

Me: No, I want a dashboard.

GH: Here's a list of issues created, assigned or mentioning you.

Me: No, I want a dashboard.  Maybe I need an organization view.  [my CPAN repos are in an organization]

GH: Here's a feed of events on repos in the organization.

Me: No, I want a dashboard of issues.

GH: Here's a list of issues for repos in the organization.

Me: Uh, can you summarize that?

GH: No.

Me: Github, you suck.  But you have an API.  Time to bust out some Perl.

So I wrote my own github-dashboard program, using Net::GitHub. (Really, I adapted it from other Net::GitHub programs I already use.) I keep my Github user id and API token in my .gitconfig, so the program pulls my credentials from there.

Below, you can see my Github dashboard of neglect (top 40 only!). The three columns of numbers are (respectively) PRs, non-wishlist issues and wishlist issues. (Wishlist items are identified either by label or by "wishlist" in the title.)

$ ./github-dashboard |  head -40
                               Capture-Tiny   3  18   0
                                    Meerkat   2   8   0
                               getopt-lucid   2   1   0
                                  Path-Tiny   1  21   0
                               HTTP-Tiny-UA   1   5   0
                         Path-Iterator-Rule   1   5   0
  Dist-Zilla-Plugin-BumpVersionAfterRelease   1   3   2
                              Metabase-Fact   1   3   0
                dist-zilla-plugin-osprereqs   1   2   0
       Dist-Zilla-Plugin-Test-ReportPrereqs   1   2   0
                                    ToolSet   1   2   0
        Dist-Zilla-Plugin-Meta-Contributors   1   1   0
     Dist-Zilla-Plugin-MakeMaker-Highlander   1   0   0
                         Task-CPAN-Reporter   1   0   0
                           IO-CaptureOutput   0   7   0
                                     pantry   0   7   2
                     TAP-Harness-Restricted   0   4   0
                            class-insideout   0   3   0
                               Hash-Ordered   0   3   0
                                    Log-Any   0   3   4
                                  perl-chef   0   3   0
                                 Term-Title   0   3   0
                               Test-DiagINC   0   3   0
                          Acme-require-case   0   2   0
                                 Class-Tiny   0   2   0
                                  Data-Fake   0   2   2
                  dist-zilla-plugin-twitter   0   2   0
                   Log-Any-Adapter-Log4perl   0   2   0
                             math-random-oo   0   2   0
                                 superclass   0   2   0
                                   Test-Roo   0   2   0
                              universal-new   0   2   0
                           zzz-rt-to-github   0   2   0
                      app-ylastic-costagent   0   1   0
                      Dancer-Session-Cookie   0   1   0
          Dist-Zilla-Plugin-CheckExtraTests   0   1   0
          Dist-Zilla-Plugin-InsertCopyright   0   1   0
Dist-Zilla-Plugin-ReleaseStatus-FromVersion   0   1   0
                                 File-chdir   0   1   0
                                 File-pushd   0   1   0

Now, when I set aside maintenance time, I know where to work.

Posted in perl programming | Tagged , , , | Comments closed