Finally, a streaming Perl filehandle API for MongoDB GridFS

GridFS is the term for MongoDB's filesystem abstraction, allowing you to store files in a MongoDB database. If that database is a replica set or sharded cluster, the files are secured against the loss of a database server.

The recently released v1.3.x development branch of the MongoDB Perl driver introduces a new GridFS API implementation (MongoDB::GridFSBucket) and deprecates the old one (MongoDB::GridFS).

The new API makes working with GridFS much more Perlish. You open an "upload stream" for writing. You open a "download stream" for reading. In both cases, you can get a tied filehandle from the stream object, which lets GridFS operate seamlessly with Perl libraries that read/write handles.

Let's consider a practical example: compression. Imagine you'd like to store files in GridFS but with gzip compression. You could compress a file in memory or on disk and then upload that to GridFS. Or, you could compress it on the fly with IO::Compress::Gzip.

This demo requires at least v1.3.1-TRIAL of the MongoDB Perl driver. You can install that with your favorite CPAN client. E.g.:

$ cpanm --dev MongoDB
# or
$ cpan MONGODB/MongoDB-1.3.1-TRIAL.tar.gz

First, let's load the modules we need, connect to the database and get a new, empty MongoDB::GridFSBucket object.

#!/usr/bin/env perl
use v5.10;
use strict;
use warnings;
use Path::Tiny;
use MongoDB;
use IO::Compress::Gzip qw/gzip $GzipError/;
use IO::Uncompress::Gunzip qw/gunzip $GunzipError/;

# connect to MongoDB on localhost
my $mc  = MongoDB->connect();

# get the MongoDB::GridFSBucket object for the 'test' database
my $gfs = $mc->db("test")->gfs;

# drop the GridFS bucket for this demo
$gfs->drop;

Next, let's say we have a local file called big.txt. In my testing, I used one that was about 2 MB. The next part of the program below prints out the uncompressed size, opens a filehandle for uploading, then uses the gzip function to read from one handle and send to another. Finally, we flush the upload handle and check the compressed size that was uploaded.

# print info on file to upload
my $file = path("big.txt");
say "local  $file is " . ( -s $file ) . " bytes";

# open a handle for uploading
my $up_fh = $gfs->open_upload_stream("$file")->fh;

# compress and upload file
gzip $file->openr_raw => $up_fh
  or die $GzipError;

# flush data to GridFS
my $doc = $up_fh->close;
say "gridfs $file is " . $doc->{length} . " bytes";

Downloading is pretty much the same process. We open a download handle and an output handle for a disk file, then use the gunzip function to stream from the download handle to disk. In this case, because we don't need the handles afterwards, we can do it all on one line instead of using temporary variables. Last we report on the size (to ensure it's the same) and report on the compression ration.

# download and uncompress file
my $copy = path("big2.txt");
gunzip $gfs->open_download_stream( $doc->{_id} )->fh => $copy->openw_raw
  or die $GunzipError;
say "copied $file is " . ( -s $copy ) . " bytes";

# report compression ratio
printf("compressed was %d%% of original\n", 100 * $doc->{length} / -s $file );

If you want to try this out, you can get the whole program from this gist: compressed-gridfs.pl.

When I run it on my sample file, this is the output I get:

$ perl compressed-gridfs.pl
local  big.txt is 2097410 bytes
gridfs big.txt is 777043 bytes
copied big2.txt is 2097410 bytes
compressed was 37% of original

Having GridFS uploads and downloads represented as Perl filehandles makes interoperation with many Perl libraries super easy.

Of course, the new library still works nicely with handles you provide to it. If you want to upload from a handle, you use the upload_from_stream method:

$gfs->upload_from_stream("$file", $file->openr_raw);

Or, if you want to download to a handle, you use the download_to_stream method:

$gfs->download_to_stream($file_id, path("output")->openw_raw);

While the new GridFS API is currently only in the development version of the driver, I encourage you to try it out if you're curious.

If you have feedback, please email me (DAGOLDEN at cpan.org), tweet at @xdg or open a Jira ticket.

Thanks!

Posted in mongodb | Tagged , , , | Comments closed

Getting ready for MongoDB 3.2: new features, new Perl driver beta

After several release candidates, MongoDB version 3.2 is nearing completion. It brings a number of new features that users have been demanding:

  • No more dirty reads!

    With the new "readConcern" query option, users can trade higher latency for reads that won't roll-back during a partition.

  • Document validation!

    While MongoDB doesn't use schemas, users can define query criteria that will be checked to validate to new and updated documents. In addition to field existence and type checks, these can include logical checks as well (e.g. does field "foo" match this regex).

Other, less developer-facing features include:

  • Encryption at rest
  • Partial indexes
  • Faster replica-set failover
  • Simpler sharded cluster configuration

If you want to try out a MongoDB 3.2 release candidate, see the MongoDB development downloads page.

Of course, to take advantage of developer-facing features, you'll need an updated driver library. All the MongoDB supported drivers have beta/RC versions with 3.2 support.

The current Perl driver beta is MongoDB-v1.1.0-TRIAL, which you can download and install with your favorite cpan client:

$ cpanm --dev MongoDB

$ cpan MONGODB/MongoDB-v1.1.0-TRIAL.tar.gz

Some of the changes in the beta driver include:

  • Support for readConcern (MongoDB 3.2)
  • Support for bypassDocumentValidation (MongoDB 3.2; for when you need to work with legacy documents before validation)
  • Support for writeConcern on find-and-modify-style writes (MongoDB 3.2; can be used to emulate a quorum read)
  • A new 'batch' method for query result objects for efficient processing
  • A new 'find_id' sugar method on collection objects for fetching a document by its _id field

Whether you're ready for MongoDB 3.2 or not, I encourage you to try out the Perl driver beta.

If you find any bugs or have any comments, please open a MongoDB JIRA ticket about it, or email me (dagolden) at my CPAN.org address, or tweet to @xdg.

Thank you!

Posted in mongodb | Tagged , , , | Comments closed

If you use MongoDB and Perl, I want your feedback!

With the release of the v1.0 MongoDB Perl driver, I'm starting to sketch out a roadmap for future development and I'd like to hear from anyone using MongoDB and Perl together.

How can you help?

  1. Please take a few minutes to fill out the MongoDB Developer Experience Survey.

    This survey covers all languages that MongoDB supports in-house. In the past, there have been few responses from Perl developers and I'd like to change that.

  2. Please click the "++" button on the MetaCPAN MongoDB Release page.

    Because there are no download statistics for CPAN, gauging community interest is challenging and this is one – albeit crude – metric I (and my bosses!) can look at over time.

  3. Please file a JIRA ticket for any bugs or feature requests you have.

    If you absolutely can't stand to deal with JIRA, you can email me directly and I'll put it in JIRA for you.

  4. Please promote your work! Blog about it, tweet about, give talks, etc.

    Don't let the "hip" languages be the only ones people think of for big data and NoSQL development.

Why is giving feedback so important?

MongoDB is one of the only next-gen databases to support Perl in-house.

Your feedback helps demonstrate community interest to keep it that way.

Thank you!

Posted in mongodb, perl programming | Tagged , , , | Comments closed

MongoDB Perl Driver v1.0.2 released

The MongoDB Perl driver v1.0.2 has been released on CPAN.

This is a stable bugfix release. Changes include:

  • PERL-198 Validate user-constructed MongoDB::OID objects.
  • PERL-495 Preserve fractional seconds when using dt_type 'raw'.
  • PERL-571 Include limits.h explicitly.
  • PERL-526 Detect stale primaries by election_id (only supported by MongoDB 3.0 or later).
  • PERL-575 Copy inflated booleans instead of aliasing them.

Please see the Changes file if you wish additional detail.

We always appreciate feedback from the user community. Please submit comments, bug reports and feature requests via JIRA.

NOTE: If you use the MongoDB Perl driver, please click the "++" button on the MongoDB release page.

Posted in mongodb | Tagged , , | Comments closed

My Github name has changed from dagolden to xdg

For a long time, my Github identity matched my CPAN ID and I was 'dagolden' on Github. Today, I'm switching to be 'xdg' on Github, to match my Twitter and IRC handles.

The 'dagolden' Github account has become an organization to avoid breaking links and avoid the pain of manually migrating 100's of repositories.

So, henceforth, if you want to highlight me on Github, use "@xdg", not "@dagolden".

Posted in meta | Tagged , , | Comments closed

© 2009-2016 David Golden All Rights Reserved