Adventures in Benchmarking, Part 1

I created Path::Tiny because I wanted a faster file path utility. But believing it would be faster isn't the same thing as demonstrating it, so I decided to create some benchmarks. That got complicated, but now I have a pretty nice benchmarking toolkit.

This article describes my setup and process and shows results for object creation and basic file path manipulations for Path::Tiny and three other file path utilities: Path::Class, IO::All and File::Fu.

In subsequent articles, I'll show benchmarks for file input and output, including some startling facts about Unicode slurping.

tl;dr: Path::Tiny is really fast!

Goodbye Benchmark, hello Benchmark::Forking

Hopefully, you've heard of the good-old Benchmark module. It's a good starting point for benchmarking Perl programs. Or is it?

Even if you've used it, you might not realize that Benchmark runs tests in alphabetical order, doing all the loops for one test before going onto the next. Because prior runs affect Perl's memory allocation, test names can affect benchmarking results.

Here's a demonstration of what can happen. All the test subs are the same -- they just slurp a 1 megabyte file -- so the only difference is the name:

use v5.10;
use strict;
use warnings;
use Benchmark qw( cmpthese );
use Path::Tiny;

my $file = "ascii-large";

my $count = -1;
cmpthese(
    $count,
    {
        map { $_ => sub { path($file)->slurp } } ("a" .. "l")
    }
);

And here is the result:

$ perl bench.pl 
    Rate    l    h    i    j    f    b    g    c    a    e    k    d
l 1599/s   -- -14% -15% -19% -20% -20% -21% -21% -21% -23% -23% -23%
h 1866/s  17%   --  -1%  -6%  -6%  -7%  -8%  -8%  -8% -10% -10% -10%
i 1879/s  17%   1%   --  -5%  -6%  -6%  -8%  -8%  -8%  -9%  -9%  -9%
j 1981/s  24%   6%   5%   --  -1%  -1%  -3%  -3%  -3%  -4%  -4%  -4%
f 1995/s  25%   7%   6%   1%   --  -0%  -2%  -2%  -2%  -3%  -4%  -4%
b 1999/s  25%   7%   6%   1%   0%   --  -2%  -2%  -2%  -3%  -3%  -4%
g 2033/s  27%   9%   8%   3%   2%   2%   --  -0%  -0%  -2%  -2%  -2%
c 2035/s  27%   9%   8%   3%   2%   2%   0%   --  -0%  -2%  -2%  -2%
a 2036/s  27%   9%   8%   3%   2%   2%   0%   0%   --  -1%  -2%  -2%
e 2067/s  29%  11%  10%   4%   4%   3%   2%   2%   2%   --  -0%  -0%
k 2070/s  29%  11%  10%   4%   4%   4%   2%   2%   2%   0%   --  -0%
d 2074/s  30%  11%  10%   5%   4%   4%   2%   2%   2%   0%   0%   --

Wow! Sucks to be called "l", doesn't it! The better letters outperform you by 30%!

With volatility like that, how can you possibly draw useful conclusions?

I hoped that someone had already encountered this and figured out a fix, so I checked CPAN. There, I found Benchmark::Forking, which monkeypatches Benchmark to run each test in a subprocess.

Here is that code. Notice that Benchmark::Forking is a drop-in replacement.

use v5.10;
use strict;
use warnings;
use Benchmark::Forking qw( cmpthese );
use Path::Tiny;

my $file = "ascii-large";

my $count = -1;
cmpthese(
    $count,
    {
        map { $_ => sub { path($file)->slurp } } ("a" .. "l")
    }
);

And now the results. Yes, there's variance, but it's much, much smaller:

$ perl forking.pl 
    Rate   l   c   i   g   k   b   e   a   d   h   j   f
l 1982/s  -- -1% -1% -2% -3% -3% -4% -4% -4% -4% -5% -5%
c 2000/s  1%  --  0% -1% -2% -2% -3% -4% -4% -4% -4% -4%
i 2000/s  1%  0%  -- -1% -2% -2% -3% -4% -4% -4% -4% -4%
g 2018/s  2%  1%  1%  -- -1% -1% -2% -3% -3% -3% -4% -4%
k 2036/s  3%  2%  2%  1%  --  0% -1% -2% -2% -2% -3% -3%
b 2036/s  3%  2%  2%  1%  0%  -- -1% -2% -2% -2% -3% -3%
e 2055/s  4%  3%  3%  2%  1%  1%  -- -1% -1% -1% -2% -2%
a 2074/s  5%  4%  4%  3%  2%  2%  1%  --  0%  0% -1% -1%
d 2074/s  5%  4%  4%  3%  2%  2%  1%  0%  --  0% -1% -1%
h 2074/s  5%  4%  4%  3%  2%  2%  1%  0%  0%  -- -1% -1%
j 2093/s  6%  5%  5%  4%  3%  3%  2%  1%  1%  1%  --  0%
f 2093/s  6%  5%  5%  4%  3%  3%  2%  1%  1%  1%  0%  --

Benchmark::Forking FTW!

DRY benchmarks

I wrote a lot of benchmarks. After the first half dozen, I realized that I was copying and pasting the same files over and over. What a huge code smell! While there might be a module out there to implement modular benchmarks (let me know in comments), I opted for some quick and dirty code generation instead.

I grouped snippets of benchmarking code into subdirectories and wrote a harness that read the snippets, wrapped a shell of code around them, dumped them into files, ran them, and gathered the results.

Here is the code shell, which I kept in a HERE document. The words in all caps were replaced by substitution to customize each test. (DIY templating.)

#!/usr/bin/env perl
use v5.10;
use strict;
use warnings;

use Benchmark qw( :hireswallclock );
use Benchmark::Forking qw( timethese );
use JSON -convert_blessed_universally;
use File::pushd qw/tempd/;

use File::Fu;
use IO::All;
use Path::Class;
use Path::Tiny;

my $count = COUNT;
my $corpus = path("CORPUS");
my $test = path("TEST");
my $result;

TIMETHESE

print JSON->new->allow_blessed->convert_blessed->encode($result);

The variables like $corpus are fixtures that I wanted available to all the benchmark snippets. The $result variable gets set in the snippet and communicated back to the harness via JSON.

The TIMETHESE marker is replaced by a snippet of benchmarking. Here is one for slurping an ASCII file. It determines a file to slurp based on the name of the test, and then gets the results of timing tests (the "none" stops output so we don't mess up the JSON return).

my $size = $test->basename;
my $file = $corpus->child("ascii-$size");
die "Couldn't find $file" unless $file->exists;

$result = timethese (
    $count,
    {
        'Path::Tiny'  => sub { my $s = path($file)->slurp },
        'Path::Class' => sub { my $s = file($file)->slurp },
        'IO::All'     => sub { my $s = io($file)->slurp },
        'File::Fu'    => sub { my $s = File::Fu->file($file)->read },
    },
    "none"
);

I created that file as "small", so it would slurp the "ascii-small" corpus. Then I just symlinked it to "medium", "large", and "huge" and had tests for those corpuses (corpi?) corpora as well without any extra code to write. DRY!

You can see the benchmark harness code on Github.

Pictures worth a thousand words

With so many benchmarks, trying to make sense of it from ASCII tables would be crazy. I was inspired by the cool charts in Ricardo's blog about file finder speed, so I copied his code out of his repository and customized it.

It used Chart::Clicker, and getting that installed wasn't as easily as I'd have liked. Finally, I figured out I had to install Cairo development libraries, and even then, I had to force install Chart::Clicker despite some test failures. Nevertheless, it worked.

Here's what my code does:

  • Reads in benchmark data from a JSON file
  • Pivots the data from benchmark-module to module-benchmark
  • Chooses an order to display benchmarks (e.g. descending by worst performance)
  • Converts data to logarithms
  • Adds data series to a chart object
  • Formats the chart

The worst part was getting the chart formatting the way I wanted it. It took a lot of trial and error.

It's more or less what you would do with a spreadsheet, just done in Perl instead. And once it was written, I could use the same code to generate charts for all the different benchmark groups -- something I wouldn't want to do over and over in Excel.

You can also see the chart code on Github.

Next, I'll show some results.

Comparing file object creation

One of the first design goals I had for Path::Tiny was making object construction as cheap and lazy as possible because I'd seen in Ricardo's file-finding exercise just how costly that can be when iterating over a lot of files.

I was particularly curious how construction times might vary along several dimensions:

  • file objects versus directory objects
  • paths that exist or not
  • single level versus nested paths
  • absolute versus relative paths

I wound up with twelve benchmarks across those variations, which I plotted on a log-scaled axis.

File object construction

As I had hoped, Path::Tiny absolutely crushed the others across all benchmarks.

Path::Class and File::Fu drop off precipitously for absolute files and relative files that are nested, possibly because they create directory objects inside the file object, so two objects get created for every file object.

The flip side is that Path::Class spikes for a single-level, relative file (whether it exists or not), possibly because it's not creating an expensive directory object.

IO::All is consistent, but slow. It seems to be doing a lot of work in its constructor.

Object construction is a very synthetic baseline. In the real world, we actually want to manipulate those paths and the files or directories they represent. Still, it's built-in overhead and any method that returns another object is going to incur that overhead more than once.

Comparing file path manipulation

Next, I looked at several types of path manipulation:

  • Getting a child path (file or subdirectory)
  • Finding a basename
  • Relative to absolute (and vice versa)
  • Finding a parent directory
  • Finding the contents of a directory

I wound up with ten benchmarks. These are also plotted on a log scale. While the axis scaling differs from the last chart, you'll still be able to see how speeds drop off from the prior chart.

File object manipulation benchmark chart

Once again, Path::Tiny comes out on top!

File::Fu performs pretty well, and is nearly as fast as Path::Tiny at generating a new object for a file in a directory.

Everything performs similarly at converting an absolute path to a relative one, in part because it's sufficiently tricky that Path::Tiny falls back to using File::Spec like the other modules.

The slowest benchmark is the directory listing one. This consisted of reading my $HOME directory, which has about thirty mixed files or directories.

Unlike the other benchmarks, which are mostly manipulations of the path, this one requires reading a directory handle and constructing objects from the results. Assuming directly reading is equally efficient, you're seeing the cumulative effects of the object construction times.

Conclusions

Good benchmarking is difficult. Using Benchmark::Forking eliminates some of the noise, letting us have more confidence in the trends we see.

By taking some extra time to write a harness for my benchmark snippets and a chart generator, I was able to create a lot of benchmarks and visualize them with very little code. Writing new benchmarks is now just a matter of putting snippets into a directory.

The charts show Path::Tiny outperforming on every single benchmark, often by a fairly wide margin. (Remember, those charts were log scale!)

The next article in this series will look at file slurping, the disastrous costs of the common way to do strict Unicode decoding, how File::Slurp lies about binmodes, and how Path::Tiny wound up as the Unicode slurping champion.

Update: Ricardo informs me the correct word is "corpora".

This entry was posted in perl programming and tagged , , . Bookmark the permalink. Both comments and trackbacks are currently closed.

8 Comments

  1. Posted February 8, 2013 at 6:54 am | Permalink

    For making the charts, you might consider Chart::Strip.

    • Posted February 8, 2013 at 11:16 am | Permalink

      For strip charts, it looks good, but it's too specific to time series data for what I need.

  2. Posted February 10, 2013 at 8:44 am | Permalink

    Interesting but line-graphs are really not appropriate for discrete comparisons: They are best suited for looking at the evolution of set over variables over time.

    Given that you want to highlight the performance of Path::Tiny in relation to other modules within each benchmark, I would have just used bar graphs with the current horizontal axis but expressed the vertical axis as a percentage of Path::Tiny's performance.

    HTH,

    -- Sinan

    • Posted February 11, 2013 at 12:36 pm | Permalink

      I think bar charts are appropriate in isolation, but I find sets of bar charts cluttered and hard to read. Having an XY chart for a discrete set of X is a perfectly reasonable way to compare.

      So what do the lines add? It makes it much easier to visually see whether any particular point has gone up or down from one discrete point to the next.

      If the discrete points were randomly ordered, then the lines themselves could be confusing clutter, zigzagging up and down. Instead, I ordered the discrete points so that general trends were more easily visible.

      • Anonymous
        Posted February 11, 2013 at 10:13 pm | Permalink

        "So what do the lines add? It makes it much easier to visually see whether any particular point has gone up or down from one discrete point to the next."

        The problem is that each X value is an independent test. That is, X_{i} is not the predecessor to X_{i+1}, they are different functions. A line graph implies that some quantity is changing over a sequence of events.

        It would make more sense to rotate the chart 90 degrees and put the function names horizontally so you can actually read them. Then group them logically based on the functions they accomplish.

        I agree with A. Sinan Unur that scaling the speeds with respect to Path::Tiny would be nice, since that is your module of interest and it is the fastest.

        • Posted February 11, 2013 at 10:54 pm | Permalink

          Scaling the speeds with respect to Path::Tiny would obscure the different absolute speed of operations. Understanding what is fast and what is slow -- absolutely -- is critical to knowing what and how to optimize.

  3. brian d foy
    Posted February 14, 2013 at 12:08 am | Permalink

    I haven't been able to replicate your results. For Perls v5.10.1, v5.12.1, and v5.14.2, all compiled with default values on Mac OS X.8, I don't see any ordering effects. Which version of Perl are you using on what operating system?

    Also note that the absolute speed of the operations don't matter to anyone but you. Someone with a different Perl, different compiler, different hardware, and so on, won't likely get the same absolute results. Also, there's no trend in your ordering. You could have ordered them highest time to lowest time, which would have been more useful since you only need to look at one side of the chart to find the slow operations. The lines and their slopes don't mean anything here.

    • Posted February 14, 2013 at 11:03 am | Permalink

      The absolute results are relevant between benchmarks, such as seeing the difference between creating objects with absolute and relative paths. The specific numbers on my system aren't relevant to anyone else, but I don't see a need to rescale everything to obscure it. I could leave the numbers off the Y axis entirely and the shape of the graph wouldn't change.

      The benchmarks are ordered descending by slowest time. Look at the bottom points on the graphs.

One Trackback

© 2009-2014 David Golden All Rights Reserved