A while ago, I put together a test sample of 15,569 CPAN metadata files from a minicpan repository. I'm happy to report that with the latest release today, CPAN::Meta can handle almost 99% of metafiles found "in the wild". Here is the breakdown:
- no errors: 11411 (73.29%)
- fixable errors: 3994 (25.65%)
- parser errors: 124 (0.80%)
- missing 'name' or 'version' fields: 32 (0.21%)
- remaining errors (not fixable): 8 (0.05%)
The "unfixable" errors range from missing mandatory fields deep within the structure that can't be assumed from context to invalid module names or else just strange data that somehow slipped into a metafile.
The "torture" part of the test was partly a test of my own patience, since I was tweaking CPAN::Meta and re-running the test against the 15,000 distributions over and over again. One quick-win worth mentioning was Parallel::Iterator, which let me take advantage of all four cores of my CPU. As expected, it cut the runtime by nearly three-quarters (11.5 seconds vs 38.3 seconds). If anyone is interested in seeing how I used it, my quick-and-dirty CPAN::Meta torture program is available online.