GridFS is the term for MongoDB’s filesystem abstraction, allowing you to store files in a MongoDB database. If that database is a replica set or sharded cluster, the files are secured against the loss of a database server.
The recently released v1.3.x development branch of the MongoDB Perl driver introduces a new GridFS API implementation (MongoDB::GridFSBucket) and deprecates the old one (MongoDB::GridFS).
The new API makes working with GridFS much more Perlish. You open an “upload stream” for writing. You open a “download stream” for reading. In both cases, you can get a tied filehandle from the stream object, which lets GridFS operate seamlessly with Perl libraries that read/write handles.
Let’s consider a practical example: compression. Imagine you’d like to store files in GridFS but with gzip compression. You could compress a file in memory or on disk and then upload that to GridFS. Or, you could compress it on the fly with IO::Compress::Gzip.
This demo requires at least v1.3.1-TRIAL of the MongoDB Perl driver. You can install that with your favorite CPAN client. E.g.:
$ cpanm --dev MongoDB # or $ cpan MONGODB/MongoDB-1.3.1-TRIAL.tar.gz
First, let’s load the modules we need, connect to the database and get a new, empty MongoDB::GridFSBucket object.
#!/usr/bin/env perl use v5.10; use strict; use warnings; use Path::Tiny; use MongoDB; use IO::Compress::Gzip qw/gzip $GzipError/; use IO::Uncompress::Gunzip qw/gunzip $GunzipError/; # connect to MongoDB on localhost my $mc = MongoDB->connect(); # get the MongoDB::GridFSBucket object for the 'test' database my $gfs = $mc->db("test")->gfs; # drop the GridFS bucket for this demo $gfs->drop;
Next, let’s say we have a local file called big.txt
. In my testing, I used one that was about 2 MB. The next part of the program below prints out the uncompressed size, opens a filehandle for uploading, then uses the gzip
function to read from one handle and send to another. Finally, we flush the upload handle and check the compressed size that was uploaded.
# print info on file to upload my $file = path("big.txt"); say "local $file is " . ( -s $file ) . " bytes"; # open a handle for uploading my $up_fh = $gfs->open_upload_stream("$file")->fh; # compress and upload file gzip $file->openr_raw => $up_fh or die $GzipError; # flush data to GridFS my $doc = $up_fh->close; say "gridfs $file is " . $doc->{length} . " bytes";
Downloading is pretty much the same process. We open a download handle and an output handle for a disk file, then use the gunzip
function to stream from the download handle to disk. In this case, because we don’t need the handles afterwards, we can do it all on one line instead of using temporary variables. Last we report on the size (to ensure it’s the same) and report on the compression ration.
# download and uncompress file my $copy = path("big2.txt"); gunzip $gfs->open_download_stream( $doc->{_id} )->fh => $copy->openw_raw or die $GunzipError; say "copied $file is " . ( -s $copy ) . " bytes"; # report compression ratio printf("compressed was %d%% of original\n", 100 * $doc->{length} / -s $file );
If you want to try this out, you can get the whole program from this gist: compressed-gridfs.pl.
When I run it on my sample file, this is the output I get:
$ perl compressed-gridfs.pl local big.txt is 2097410 bytes gridfs big.txt is 777043 bytes copied big2.txt is 2097410 bytes compressed was 37% of original
Having GridFS uploads and downloads represented as Perl filehandles makes interoperation with many Perl libraries super easy.
Of course, the new library still works nicely with handles you provide to it. If you want to upload from a handle, you use the upload_from_stream
method:
$gfs->upload_from_stream("$file", $file->openr_raw);
Or, if you want to download to a handle, you use the download_to_stream
method:
$gfs->download_to_stream($file_id, path("output")->openw_raw);
While the new GridFS API is currently only in the development version of the driver, I encourage you to try it out if you’re curious.
If you have feedback, please email me (DAGOLDEN at cpan.org), tweet at @xdg or open a Jira ticket.
Thanks!