Building perl in parallel

I just installed a quad-core CPU and was curious how much time it could shave off configuring, building and testing the Perl core. This is particularly useful to me when I'm applying patches submitted to perl5-porters. After reviewing a patch, I want to make sure that it doesn't break anything before I commit it and push it to the master repository.

I already had a small utility program that I use for building perl from the git source, so I adapted it to let me select how many processes to use. It more or less does the following for any particular N:

$ git clean -dxf
(copy in previous config.sh and Policy.sh)
$ Configure -ders -Dusedevel ( ... plus other stuff ... )
$ TEST_JOBS=N make -j N test_harness

Then I used the time program to time the script for different numbers of processes. Here is the result:

Despite the usual advice that parallel builds should use processes equal to the number of CPUs (or cores) plus one, adding more processes still squeezes out a little more speed -- though only around 10 seconds towards the end. That suggests to me that there is still a lot of IO-bound work, which makes sense given the number of individual test files in the Perl core.

As a technical note, the tests all use a cached copy of the config.sh and Policy.sh files to speed up Configure. I also use ccache to speed up compilation. All of the test runs shown in the graph were done after doing a full configure/build/test cycle, so the cache was "pre-loaded".

This entry was posted in perl programming and tagged , . Bookmark the permalink. Both comments and trackbacks are currently closed.

4 Comments

  1. Posted June 2, 2010 at 4:26 pm | Permalink

    Which quad core cpu did you get? I recently got a new laptop with a quad Core i7 cpu, with hyperthreading. I remember hyperthreading being a joke on the Penitum 4. It isn't a joke anymore.

    I did timed some Linux kernel compiles with 2, 4, 8, and 12 jobs. I don't know what I did with the numbers but the compile times continued to drop even beyond 8 jobs. Even video encoding goes over 50% faster with 8 threads instead of 4.

    • dagolden
      Posted June 2, 2010 at 4:55 pm | Permalink

      Core 2 Quad 9400. I was upgrading old hardware and the motherboard isn't compatible with the Core i3/5/7 series. This was also on a stock Ubuntu distribution -- so there was no other kernel optimization, either.

  2. Posted June 2, 2010 at 5:58 pm | Permalink

    If you are in linux another thing you might try is doing the build in a RAM disk. I did it on my laptop which has 4 gig's of ram, mostly to avoid hitting the slow write speed of the SSD. That may shave off some of that IO.

  3. Posted June 3, 2010 at 4:36 am | Permalink

    I did some testing myself because I was curious whether the Intel SSD in my laptop would make a difference. Of course, my system isn't the same as yours in other regards, but the timings are remarkably similar (see below). This laptop is a core i7 (2 cores, 4 threads, 2.66GHz base frequency) with the fast IO of an SSD and plenty of RAM.

    The result is here. I used a little tool to read your timings from your plot. Your data is shown in blue for comparison. Mine are red, with a fit (480/x^1.6+206) shown as the black curve. My timings were run 11 times and the points include error bars. The error bars aren't visible even though they represent the spread of the data as opposed to the even smaller uncertainty of the mean. The results have VERY low fluctuations.

    At low process counts, my somewhat faster-per-core and faster IO system is slightly ahead. But your four real cores are a little faster at higher no. of processes as expected. With pre-warmed caches, enabled ccache, and plenty of RAM, the build and test process doesn't seem to be limited by IO at all.

    There's also the program I wrote to generate the plot.

© 2009-2014 David Golden All Rights Reserved