ETOOMANYPROJECTS

When it rains it pours. As much as I’d like to keep my projects a FIFO queue, it’s operating a bit like a stack these days. Here’s what’s keeping me busy (when I’m not busy enough with paying work and family life):

  • Reviewing and finalizing ‘package NAME VERSION’ patches for the Perl core (hopefully before the code freeze)
  • CPAN Testers 2.0 migration project — volunteers are jumping in, which is good since I’ve barely had time to do anything but answer questions and help give people direction. I haven’t even had a chance to blog about first week progress
  • CPAN Meta Spec 2.0 — delayed once and will be delayed again. This is really, really close to done, but just doesn’t have the urgency of everything else
  • Assorted bug reports and module patches — and only if they take a few minutes or less

That’s my Iron Man post for the week. Stay tuned next week for actual progress reports, I hope.

Posted in cpan-testers, perl-programming, toolchain | Tagged , , | 1 Comment

Project plan for CPAN Testers 2.0 migration

I’ve spent a good deal of time brainstorming ideas for migrating to CPAN Testers 2.0 (CT2.0) and then paring down what I came up with to the barest minimum I think is necessary to migrate CPAN Testers off email/NNTP and still build a foundation for the future. This has been frustrating to spend a lot of time on, but deciding what not to do is probably as important as deciding what we need to do if we’re going to hit the deadline.

Here is my draft of a project plan for launching CPAN Testers 2.0 by the March 1. I’ve broken it up into ‘Architecture Tasks’ and ‘Migration Tasks’ and given some indication of the rough timing I think we need to hit. I’ve also added some commentary on work steps and current status. I’ve indicated in bold text who should lead a work block (if known) and where volunteers would be helpful.

Questions and suggestions are appreciated — particulary if something isn’t clear or if there are important tasks I’ve left out.

If you have time, interest and relevant expertise to help, please let me know in the comments or by email to dagolden at cpan dot org.

(Apologies for any crude abbreviations or shorthand throughout.)

1. Architecture Tasks

The CPAN Testers architecture can be broadly divided into three groups

  • Clients — the programs that run tests, create reports and submit them to the central CPAN Testers server
  • Metabase — the server and storage programs to receive and archive reports (and provide very rudimentary search capabilities)
  • Reporting — the websites and support programs to summarize CPAN Testers report data and syndicate them in various ways (e.g. *.cpantesters.org)

1.1 CT Clients (2nd half Jan)

Existing clients depend on Test::Reporter for email (or other) transport ultimately to the cpan-testers mailing list. Existing clients needs to be migrated to use the Test::Reporter::Transport::Metabase plugin.

As I’d like to start beta testing in February (see the Migration task list), these changes should be done in the second half of January after CT Metabase libraries are revised.

1.1.1 Test::Reporter::Transport::Metabase

Test::Reporter::Transport::Metabase and Metabase::Client::Simple (upon which the Metabase transport depends) are “done” but may need tweaks based changes to how user profiles and authentication credentials are handled (see below).

I will take responsibilty for making any necessary changes to these components.

1.1.2 CPAN::Reporter-based clients

CPAN::Reporter based clients already have support for Test::Reporter::Transport::Metabase, though they may need to be altered depending on user profile changes. CPAN::Reporter clients include:

  • CPAN
  • CPAN::Reporter::Smoker

I will take the lead to make changes or coordinate volunteer efforts for these clients.

1.1.3 CPANPLUS-based clients

I don’t know whether CPANPLUS-based clients support transport plugins or not. Someone will need to look into this and patch them if they do not. These clients include:

  • CPANPLUS
  • CPANPLUS::YACSmoke
  • minismokebox

I think Chris Williams should probably lead or coordinate this effort.

1.2 CT Metabase

The CPAN::Testers Metabase will replace the email/NNTP archive. The design for the initial launch is to use Amazon Web Services to provide scalability and reliability with minimal administrative overhead.

The first draft of the Metabase framework is mostly complete, but there are several areas needing work to support the CT2.0 launch.

1.2.1 Libraries (1st half Jan)

Enhancements to libraries need to happen quickly to support other components.

  • Implement search of new entries — only partial search capabilities are needed initially. The CT Reporting servers must be able to query for new reports added to the CT Metabase. Almost no work has been done on search so this may be a major workblock.
  • Revise user profiles and credentials — the current approach to user profiles is flawed in that it merges public user profile information with user authentication credentials. The Metabase::User::Secret fact needs to be removed from Metabase::User::Profile, but needs to be submitted with all Metabase::Client::Simple requests. Matching changes need to be made in the Metabase::Web and ::Gateway classes. This is another major workblock.
  • Metabase backend for AWS — Metabase::Archive::S3 and Metabase::Index::SimpleDB need to be written. This should be relatively straightforward.
  • Utility to map NNTP IDs to/from GUIDs — I plan to write a very simple library to standardize and abstact how this conversion will happen

Given the intimate knowledge of Metabase needed, I plan to take the lead on these tasks, but may recruit volunteers for portions of it.

1.2.3 Metabase Web Server (2nd half Jan)

The Metabase::Web server is a very primative Catalyst server and has only been run in standalone mode. It needs to be updated to run on a recent Catalyst framework. Configuration and deployment decisions need to be made for it to run on an Amazon EC2 instance.

This needs to be done after Metabase components are revised but before February in order to be ready to start beta testing.

In addition to Catalyst deployment, we need to have sufficient logging to do performance analysis and reporting to see how well the system is scaling.

I would like to find some Catalyst experts to volunteer to lead this effort or at least coach other less-experienced volunteers.

1.3 CT Reporting

The CPAN Testers Reporting infrastructure consists primarily of the data and web applications on cpantesters.org that provide reports to users or otherwise feed downstream applications like the CPAN Dependencies site or CPAN Testers Matrix.

The goal for migration is to make the change appear relatively seamless. In the longer-term, downstream apps may be able to use web services rather than relying on a large SQLite database.

Barbie has already started work on several of these and should recruit additional volunteers as necessary.

1.3.1 Interface to CT Metabase (2nd half Jan)

The current CPAN Testers reporting sites depend on a database fed from the NNTP archive. We need to replace this with a feed from the CT Metabase.

In the long run, this would be provided via Metabase::Web services, but for expediency I think the best approach is a direct connection to the Amazon S3/SimpleDB backend via a Metabase::Librarian.

This should be a relatively trivial matter of making sure the right libraries are installed on the cpantesters.org server and that the AWS access keys are in appropriately protected configuration files.

I will work with Barbie to ensure this is done before beta testing in February.

1.3.2 Statistics database (1st half Feb)

During the beta period, we need to test updating the CPAN Testers stats database from the CT Metabase. This probably needs to be done in parallel to the existing sites and databases.

One major change will be converting from NNTP IDs to CT Metabase GUID to identify reports.

If possible, we would like to shorten the time lag in processing updates, but this is ‘nice to have’, not ‘must have’ functionality.

1.3.3 Report viewer (1st half Feb)

When the NNTP archive shuts down and reports are sent to the CT Metabase, users will not longer be able to view reports on nntp.perl.org. A new report viewer web application needs to be deployed on cpantesters.org.

This also entails regularly updating a mammoth reports database with a copy of the text report from the CT Metabase (using the interface in 1.3.1). Reports will be indexed and made available for display based on their GUID, not NNTP ID, so existing sites will need to change their URLs to use GUIDs as well as pointing to a new URL endpoint.

1.3.4 Find a tester (2nd half Feb)

The “Find a Tester” service will need to be revised to get contact information via GUID rather than NNTP ID. This seems less critical to have immediately and could slip past February if necessary.

2. Migration Tasks

These tasks describe a sequence of activities to migrate legacy reports and launch CPAN Testers 2.0. Many of them depend on architecture component described above.

2.1 NNTP Migration (2nd half Jan)

Existing NNTP reports need to be migrated into the CT Metabase. This may actually stretch out over January, but doesn’t seem to be critical to be done before beta testing.

Because of the volume of reports and the possibility that we might need to make late changes that require re-migrating the reports, I’d like to design the conversion in a way that can be parallelized on Amazon EC2.

Steps in the conversion should include:

  • Upload NNTP archive tarballs to S3 (638 files) — these have already been generated by the Perl NOC and uploaded to S3
  • Generate submitter profiles and add them to the CT Metabase– Metabase requires “profiles”, not just email addresses to identify users. For legacy reports, profiles need to be generated for all known testers (based on the existing address mappings used for stats.cpantesters.org.
  • Write NNTP article converter — this will need to extract articles from the archive tarball, filter out non-report articles or badly formatted reports, parse them into Metabase facts, link them to submitter profiles and inject them into the CT Metabase
  • Create EC2 converter instances — tarballs can and should be processed in parallel. This is just a matter of adding archive tarball IDs to an Amazon SQS queue that custom EC2 instances can use to dequeue tasks, run the converter program and shut down when the queue is empty.
  • Queue archive tarballs IDs and run 1+ instances against the queue — once the converter instances have been created, we then just run enough in parallel against the 600+ tarballs to get a “fast enough” conversion rate
  • Repeat for new archive tarballs — between now and the end of February, additional tarballs should be provided by Perl NOC to migrate reports arriving since the first batch of tarballs was created.

As the conversion process is just an extension of what already happens in Test::Reporter::Transport::Metabase, I will take responsibility for finishing the conversion.

2.2 Deploy CT2.0 Server (2nd half Jan)

The Metabase framework has not been tested “at scale” — meaning processing on the order of 500,000+ reports per month. To ensure we can easily scale, the CT 2.0 Server will be deployed on EC2, so that we can deploy servers in parallel if necessary.

Deployment steps will include:

  • Choose base EC2 machine image — selecting an existing EC2 image to be customized as a CPAN Testers server
  • Install and configure CT Metabase components — the design for this should have been worked out as part of the architecture work, but the server needs to be deployed on the machine image and properly configured for automatic start when the VM boots
  • Launch instance — fire it up and start receiving reports!

I’m personally interested in getting my hands dirty with this, but may be overwhelmed with other tasks. If someone has prior EC2 experience and can volunteer as a lead or coach, that would be very helpful.

2.3 Beta test (1st half Feb)

Once we launch the server, we need a test period. This entails getting a small handful of testers to gradually ramp up the volume over a couple weeks to see how well the servers and Metabase perform.

Steps include:

  • Select beta test group — likely some of the “high volume” testers can gradually convert some of their smokers over; we probably need at least one CPAN::Reporter based smoker and one CPANPLUS based smoker
  • Email profiles and instructions to beta test group — we will have pre-generated profiles as part of the migration, so we can provide these to testers so their new reports have a consistent identity
  • Test updating statistics DB from CT Metabase — we want to see the stats database getting regular report updates from beta tester
  • Test new report viewer and tester-finders — new reports that were only submitted to Metabase (and not to NNTP) should be visible in the report viewer based on their GUIDs
  • Write NNTP tail daemon to convert new NNTP articles to CT Metabase — until the NNTP archive is shut down, reports will continue to be submitted the old way; with the CT2.0 beta running, we need a daemon to “tail” the NNTP archive and continuously migrate reports
  • Test throughput and deploy instances and load balancing if necessary — based on the results of the beta test, try deploying additional instances and a load balancer

This is mostly process management (and a little coding of an NNTP tail daemon) and doesn’t require deep Metabase expertise. I’d like to find a volunteer to be the ‘beta test manager’ so I and others can be free to hack on the libraries or server for fixes if we need it.

2.4 Launch (2nd half Feb)

We want a buffer to start the launch before the March 1 deadline. If all goes well with the beta test, we should aim to “launch” in mid-Feburary.

Steps include:

  • Email profiles and instructions to all testers — just like in beta testing, but to all ‘active’ testers (definition of ‘active’ to be defined)
  • Switch cpantesters.org to new databases — assuming the CT Metabase driven stats database and report viewer worked well in beta, the production *.cpantesters.org applications should switch over to the new databases; at this point, CT Reporting should be entirely independent of the NNTP archive at the Perl NOC
  • Write NNTP tail daemon to warn email submitters of the shut-down date — hopefully, most testers will switch over quickly, but there will be stragglers who may not get the news; we need another NNTP tailing daemon to gently hassle them. (This can probably be adapted easily from my CPAN Testers nagbot.)
  • Coordinate with Perl NOC for a graceful sunset period — ideally, we’d like a ‘kind warning’ to testers trying to submit via email as of or after the deadline

Either the beta-test manager or another volunteer should coordinate these efforts.

2.5 Post launch (March)

After the launch, we’ll want to start fixing things we find or didn’t get to before the deadline.

Some early thoughts:

  • Monitor performance — see how things scale with all tests going through the CT Metabase
  • Improve syndication (methods, lag time, etc.) — some people used to follow the NNTP archive. While some syndication will happen via the www.cpantesters.org site, it has a lag and we can look into alternate approaches for more real-time syndication of reports
  • Migrate IRC notifiers to use new syndication — existing IRC bots monitor the NNTP archive and will need to be switched over
  • Restrict get/search access to CT Metabase server — to manage costs, we don’t want just anyone to be able to get/search from the master server, so we’ll need to tighten up access controls

2.6 Longer term improvement

After the migration, there will be a number of additional opportunities to build on the CT2.0 infrastructure.

On the client side, I’d like to see a move towards more structured data consistently captured from all clients rather than having to do crude parsing of what would otherwise have been an email text.

On the server side, I think there are a number of ways to improve query and search to support more interesting analytics, more targeted syndication of reports, and better visualizations.

Posted in cpan-testers, metabase, perl-programming | Tagged , | 3 Comments

Good, fast or cheap — pick again

I’m not sure when exactly I started thinking about new infrastructure for CPAN Testers, but it might have been a couple years ago around the time I released CPAN::Reporter 1.0. That was when I decided that Net::SMTP needed to be the default “transport” option, to avoid problems people were already having with local report submission via sendmail. It was a necessary change, but only a temporary fix for the bigger problem. At the Oslo QA hackathon in spring 2008, Ricardo, Jonas and I worked up the first draft of a framework for “CPAN Testers 2.0″ (CT2.0).

What happened next was the inevitable result of the proverb: “good, fast, or cheap — pick two”. CT2.0 was designed to be a good replacement for CT1.0, and it was being done by (cheap) all volunteer labor. So, despite some progress towards a proof of concept a year later at the Birmingham QA hackathon, there has been no real end in sight for CT1.0.

That all changed last week. With a firm deadline to hit, it’s time to reconsider the good, fast or cheap tradeoff. Fast is now critical. I think the design is good enough. What can be done quickly won’t replace all of the CT1.0 ecosystem right away but just the core transport and report repository parts.

I think cheap is what is going to change. It’s still volunteer labor, but I think there’s a way to need less of it. My current hypothesis for a plan to hit the deadline is to implement the Metabase framework on top of Amazon Web Services (AWS). That offloads scalability and reliability concerns, changing those technical and administrative challenges into resource challenges.

I’ve already successfully demonstrated a proof of concept that the existing Test::Reporter based testing clients can feed a CT2.0 Metabase. How well the framework scales remains a big unknown, but by implementing on top of AWS, we can throw resources at the problem in the short-run by deploying more EC2 instances to deal with bottlenecks. If the SQLite databases that drive the CPAN Testers statistics sites have to be regenerated from scratch each night, that’s just a MapReduce job. If the Metabase web app is too slow to deal with the test report volume, we just deploy more instances and stick a load balancer in front.

Having that flexibility simplifies the job of getting CT2.0 off the ground to a handful of to-do’s:

  • Implement a Metabase backend on top of AWS
  • Create an AWS virtual machine to accept reports and publish to the CT2.0 Metabase on AWS
  • Migrate existing NNTP reports to the CT2.0 Metabase
  • Implement a web app to serve up new and legacy reports from the CT2.0 Metabase
  • Design a process to update the CPAN Testers stats database with newly uploaded reports (from scratch if necessary)
  • Update CPAN Testers websites to link to the new CT2.0 reports archive site
  • Get testers to switch to Test::Reporter::Transport::Metabase

I think that gets most of what we need by March 1 and without a whole lot of new code to write and test and without a lot of sysadmin or DBA time and attention required. It’s a limited implementation, but will solve the need of the Perl NOC to get CPAN Testers reports off their email infrastructure.

I’ll be writing up more details and plans over the next week.

Posted in Uncategorized, cpan-testers, metabase, perl-programming | Tagged , | 4 Comments

Module::Build version 0.36 released

After four months of development and 15 development releases along the way, I’m pleased to announce that Module::Build 0.36 is now on CPAN. Version 0.36 will also be included in the next release of the Perl 5.11.X development series. I would like to thank everyone who contributed patches, suggestions, testing or other support to enable this release.

Summary of major changes since 0.35

Enhancements

  • Added ‘Build installdeps’ action to install needed dependencies via a user-configurable command line program. (Defaults to ‘cpan’.)
  • Command line options may be set via the PERL_MB_OPT environment variable (similar to PERL_MM_OPT in ExtUtils::MakeMaker)
  • Generates MYMETA.yml during Build.PL (new standard protocol for communicating configuration results between toolchain components)
  • Reduced amount of console output under normal operation (use –verbose to see all output)
  • Added experimental inc/ bundling; see Module::Build::Bundling for details.

New or changed properties

  • Added ’share_dir’ property to provide File::ShareDir support; File::ShareDir automatically added to ‘requires’ if ’share_dir’ is set
  • Added ‘needs_compiler’ property. Defaults to true if XS or c_source exist. If true, ExtUtils::CBuilder is also added to build_requires.
  • ‘C_support’ is no longer an optional feature. Modern ExtUtils::CBuilder and ExtUtils::ParseXS added to the ‘requires’ list. This ensures that upgrading Module::Build will upgrade these critical modules.
  • Clarified that ‘apache’ in the license attribute indicates the Apache License 2.0 and added ‘apache_1_1′ for the older version of the license

Deprecations

  • Module::Build::Compat ‘passthrough’ style has been deprecated. Using ‘passthrough’ will issue warnings on Makefile.PL generation. See Module::Build::Compat documentation for rationale.

Internals

  • Replaced use of YAML.pm with YAML::Tiny; Module::Build::YAML is now based on YAML::Tiny as well
  • A new get_metadata() method has been added as a simpler wrapper around the old, kludgy prepare_metadata() API.
  • Replaced guts of new_from_context(). Build.PL is now executed in a separate process before resume() is called. (This is generally only of interest to Module::Build or toolchain developers)
  • Add support for ‘package NAME VERSION’ syntax added in Perl 5.11.1

Notable bug fixes

  • The “test” action now dies when using the ‘use_tap_harness’ option and tests fail, matching the behavior under Test::Harness.
  • Updated PPM generation to PPM v4
  • When module_name is not supplied, no packlist was being written; fixed by guessing module_name from dist_version_from or the directory name (just like ExtUtils::Manifest does without NAME)
  • Failure to detect a compiler will now warn during Build.PL and be a fatal error when trying to compile during Build
  • Auto-detection of abstract and author fixed for mixed-case POD headers
  • resume() was not restoring additions to @INC added in Build.PL
  • When tarball paths are less than 100 characters, disables ‘prefix’ mode of Archive::Tar for maximum compatibility
  • Merging ‘requires’ and ‘build_requires’ in Module::Build::Compat could lead to duplicate PREREQ_PM entries; now the highest version is used for PREREQ_PM.
  • Module::Build::Compat will now die with an error if advanced, non-numeric prerequisites are given, as these are not supported by ExtUtils::MakeMaker in PREREQ_PM
Posted in cpan, perl-programming, toolchain | Tagged , | Leave a comment

CPAN Testers: Too much of a good thing right now

CPAN Testers has been growing by leaps and bounds lately and there have been some nice developments:

Unfortunately, this success has come at a cost. The Perl NOC, which provides the backend for CPAN Testers, is now deluged with reports. Reports are sent via email, each of which gets run through spam scanning and then archived on a filesystem and made available via NNTP.

This might have seemed like a practical design when there were only a few tens of thousands of reports and only a few hundred new ones coming in each month, but it no longer scales to what CPAN Testers needs. Worse, it puts a strain on the NOC to keep everything else running while managing the growing volume.

I’ve posted an alert to the CPAN Testers discussion list asking people to throttle their smoke testers for a while. The Perl NOC has given us a deadline of March 1 to launch CPAN Testers 2.0 and get off email/NNTP once and for all.

In the new year, I’ll be posting more about a plan to make this happen.

Posted in cpan-testers, perl-programming | Tagged , | 1 Comment

Need more tuits

It’s been a frustrating week. I really want to finish up some of my Perl projects and, between family and work obligations this week, I’ve had no chance to work on them. So if anyone has some round tuits, could you please send them my way?

At least I’m keeping up my ironman status.

Posted in perl-programming | Tagged , | Leave a comment

How to import Gravatars into Gmail in 121 lines of Perl

Of course, because it uses CPAN, it’s not really just 121 lines. And, actually, I didn’t even write it. But the point is that a short time after I wondered “gee, is there a way to import Gravatars to Gmail?”, I found a Perl program to do the job.

In this case, it turned out to be a pre-module experiment by prolific CPAN author Tatsuhiko Miyagawa. Through a Google search, I found his google-contacts-gravatar code on github.com, cloned it, installed its dependencies, and minutes later had imported all gravatars matching my contacts list.

Awesome! Thank you Miyagawa, and thank you Perl.

Posted in perl-programming | Tagged , | Leave a comment

Version number sanity

I’ve written at length how I wish version numbers were boring and why they aren’t. What I haven’t done well is to express what I think that means. In some recent conversations on the perl5-porters mailing list and #p5p on IRC, I took a stab at a definition, which I’ll repost here.

In part, my “wishlist” is to harmonize how $VERSION is defined, how $VERSION is statically parsed by ExtUtils::MM->parse_version and Module::Build::ModuleInfo, how $VERSION is specified to use(), what gets returned by UNIVERSAL::VERSION and what version->new($version) gives.

I’d like to be able to “round-trip” a version any which way.

  • set it via package NAME VERSION
  • statically parse it and get the same thing back
  • eval “use Foo $version” and succeed
  • get the same thing back from Foo->VERSION
  • give it to version->new() and get back the same thing.

I’ve posted a test file that demonstrates these desired behaviors using the new ‘package NAME VERSION’ syntax in Perl 5.11.2.

Here are some of the things that don’t work:

  • ‘01.23′ — this is interpreted as an octal. Oops!
  • ‘1_000′ — statically parsed with the underscore, but interpreted without by ‘use NAME VERSION’, and version.pm doesn’t like it at all
  • ‘v1.1000.2345′ — dotted integer components over 999 bleed over into the next field when converted to decimal, so this is really equivalent to decimal version 2.002345
  • ‘v1.2_3′ — is this an “alpha” like v1.2.3 or is this v1.23.0? Depending how this is expressed it can be interpreted either way
  • ‘1.23_01′ — a good old fashioned “alpha” decimal version — but again, it parses statically one way but is interpreted another.

It’s very possibly I’m being too draconian — my wishlist means alpha version numbers would go away entirely (and would have to be replaced with something else, certainly.) But if I had to start over, these criteria would make version numbers much easier to deal with.

Posted in cpan, perl-programming, toolchain | Tagged , | 2 Comments

Too many CPAN.pm config options?

Over the years, CPAN.pm has accumulated a staggering number of configuration options. Recently, an otherwise expert Perl programmer (that I won’t embarrass by name) asked on IRC for the name of an option rather than reading the (admittedly long) manual. This reminded me that many Perl users may not know about a handy feature that has been in the CPAN shell since 2006:

cpan> o conf /MATCH/
cpan> o conf init /MATCH/

The first form will show all options matching a pattern. The second will re-run interactive configuration with explanatory paragraphs for all options matching a pattern.

Thanks to these features, I rarely actually read the manual. As long as I have a rough idea what the option name might be, I just search for it. And I can always fall back on “o conf” to see all options and then “o conf init NAME” to read the paragraph about any particular one.

Posted in cpan, perl-programming, toolchain | Tagged , | 2 Comments

Extending the timeline for CPAN Meta Spec revisions

When I set out the time line for revisions to the CPAN Meta Spec, I was anticipating closing the public comments at the end of October and finalizing the spec at the end of November.

Fortunately for Perl, but unfortunately for me, our new pumpking, Jesse Vincent, announced on Halloween a Perl core feature freeze three weeks later (i.e. this weekend). Getting Module::Build and other bits of the toolchain ready for that has sucked up the time I was hoping to use to synthesize the Meta Spec patches for consistency and get the working group to discuss them. (I would expect that members of the working group are similarly distracted with a last-minute surge prior to the freeze.)

Therefore, I’m extending the timeline for the CPAN Meta Spec revisions by one month, and will be aiming to distribute a draft to the working group by the end of November and to finalize the new spec by Jan 1.

Posted in cpan, perl-programming, toolchain | Tagged , | Leave a comment

© 2009-2010 David Golden All Rights Reserved