Many people love Perl for how easily and quickly one can write useful little programs, but many of these one-off programs are never seen by anyone but the original author. Recently, I wrote ylastic-costagent to help process Amazon Web Services (AWS) data for analysis. Rather than leave it hidden on my hard drive, I decided to share my program on CPAN and write about some of the techniques and modules I used.
Why I wrote it
I've been using the Ylastic dashboard to help monitor AWS usage and costs. Ylastic provides some very nice features for managing AWS cloud services and some great visualizations. Unfortunately, their only option for gathering the data was a poorly documented, poorly structured Python program that scrapes Amazon's site data.
When I tried it, I had problems satisfying the dependencies. I had to install dependencies manually and even then it turns out that Ubuntu 10.10 has libraries that are too old to work. I realized that in the time it would take me (a Python non-user) to figure out how to get newer Python libraries installed, I could just whip up an equivalent program in Perl.
Two parts: a module and an executable
I followed a CPAN trend of putting most of the guts of the program into a module, App::Ylastic::CostAgent, and wrote a simpler executable that just managed the command line options, created an object, and dispatched to a
run method. I like this approach because modules are a bit easier to test than executables and author tools for managing modules are a bit better developed.
In this case, I did the command line option processing in the executable with my own Getopt::Lucid. I've also seen option processing put into the module as well. I haven't decided which I like better, though I slightly favor seeing options code in the executable because I think it should be documented there as well and I like keeping code and documentation close.
Five modules inside App::Ylastic::CostAgent
App::Ylastic::CostAgent contains less than 200 lines of code validating inputs and coordinating the interactions of five modules that do all the real work:
- Config::Tiny -- the original Python program required users to edit global variables to configure it. (Horrible!) Whenever I need a simple configuration file, I start with Config::Tiny. It even supports both top-level data and named sections, which was a good semantic fit for global Ylastic user data and named sections for each AWS account from which to collect data.
- Object::Tiny -- I didn't expect this program to manage a lot of state, but I didn't want to pass around a raw data structure either. Wrapping data as an object and calling accessors helps me avoid annoying typo errors accessing hash-keys directly. Since I didn't need a full-featured OO system like Moose, I opted for Object::Tiny instead to give me a simple constructor and some accessors.
- WWW::Mechanize -- this is the workhorse module that does the actual screen scraping and form posting. I spent more time fiddling with it that I would have liked, but I haven't used it in years and I had to relearn it.
- Archive::Zip -- Ylastic wants a zip file containing all the CSV files downloaded from AWS. Instead of saving them to disk and then running an external tool, I chose to use Archive::Zip to build the zip file in memory. I also wasted a bit of time here discovering that Archive::Zip wants you to set compression levels explicitly instead of doing it by default. (I should have RTFM more closely.) I don't really like the API and I wish someone would write an "Archive::Zip::DWIM" that makes it easier to use for a simple case like mine.
- Log::Dispatchouli -- the original Python program did some debug logging to a file. I didn't really need a powerful logging library, but I was looking for an excuse to learn Ricardo Signes' Log::Dispatchouli. With about half a dozen lines of code, I gave ylastic-costagent the ability to log either to a file or syslogd or both. Cool!
What is the first day of the next month?
The original Python program requested custom data ranges from AWS that ended on the first day of the next month. I'm not sure why they didn't just set an end date one day in the future, but I wanted to replicate their logic in ylastic-costagent.
I could have copied the algorithm from the Python program (grab date components, increment month with a modulus, etc.) but I expected an answer on CPAN already. As sometimes happens when searching CPAN, I blundered around for longer than it would have taken to just copy the logic, but I came up with this gem using Time::Piece and Time::Piece::Month:
# returns a Time::Piece object Time::Piece::Month->new( Time::Piece->new() )->next_month->start()
I'm sure there are better ways to do it, but I didn't really want to spend any more time on it.
Since dependencies were what stopped me from using the original Python program, I decided to release ylastic-costagent to CPAN so the regular Perl module toolchain could automate dependency resolution for anyone wanting to use it. That wouldn't help a Perl novice, so I included a five-line recipe in the documentation for anyone who didn't know how to configure CPAN. (Including how how to install the OpenSSL development library -- for a Debian-based system in this example.)
$ sudo apt-get install libssl-dev $ curl -L http://cpanmin.us | perl - -l ~/perl5 App::cpanminus local::lib $ eval `perl -I ~/perl5/lib/perl5 -Mlocal::lib` $ echo 'eval `perl -I ~/perl5/lib/perl5 -Mlocal::lib`' >> ~/.bashrc $ cpanm App::Ylastic::CostAgent
In the end, this took me less than a day's work, spread out over a couple actual days. Could I have sorted out my Python dependencies in that time? Possibly. But I did spent some extra time I didn't really need to being fancy with logging and the date manipulation as well as writing decent documentation. In the end, I created an easily-installable program with automatic dependency resolution -- which is exactly what I didn't have before.
It's now working for me, and I hope it might save some time for future AWS/Ylastic customers as well.