Cloud Storage Options for HDF5 - The HDF Group - ensuring long-term access and usability of HDF data and supporting users of HDF technologies

by John Readey, The HDF Group

If you are looking to store HDF5 data in the cloud there are several different technologies that can be used and choosing between them can be somewhat confusing. In this post, I thought it would be helpful to cover some of the options with the hope of helping HDF users make the best decision for their deployment. Each project will have its own requirements and special considerations, so please take this as just a starting point.

To keep things (relatively) simple in this post, we’ll just consider technologies for Amazon’s AWS service. Similar considerations will apply for Google Cloud or Microsoft Azure cloud services. Each of the major cloud vendors offer a roughly equivalent set of cloud services (though details differ) and the cost considerations are similar as well, so much of what we discuss here applies for Google and Azure.

Anyway, coming back to AWS, the first consideration will be choosing a storage platform. We’ll consider EBS (Elastic Block Store) and S3 (Simple Storage Service), and FlexFS from Paradigm4. Beyond these, there a couple of relatively new offerings from Amazon: EFS (you can think of it as NFS on AWS), and FSx for Lustre (an HPC-style filesystem for AWS), but we’ll defer consideration of those two to a future blog post.

EBS storage

EBS will seem familiar to those new to cloud computing—you can think of it as sort of external hard drive in the cloud. As with an external drive, you need to choose a specific storage capacity, connect it with a compute (EC2) instance, format the drive, and mount it as a file system. Once mounted, it works just like any POSIX filesystem, and HDF5 libraries, tools, applications will work just as they would with a non-cloud based system.

As a method of sharing data, EBS volumes are somewhat limited… they can’t be accessed outside of AWS or from a different region in which they were created.

From a cost perspective, Amazon will charge you each month based on the size of the EBS volume (as opposed to the amount of data you actually stored). If you know ahead of time how much storage you’ll need you can optimized the volume size for what you are actually using. If not, keep in mind you’ll end up paying for what you’ve allocated rather than what you’ve actually used. You can create an EBS volume as large as 64TB.

AWS offers both magnetic drives, “HDD” or solid state drives “SSD.” HDD drives have lower cost and (as you can guess), lower performance. While SSD drives will cost more, but offer better performance. Relative performance is likely to vary based on the how the application and how the HDF5 file is organized, so test to determine if the added cost of SDD is worthwhile.

One last thing to consider is that (just like with a physical drive), there’s no guarantee that the EBS volume won’t fail. Therefore you’ll need a strategy to backup any critical data you have stored on an EBS volume.

S3 Storage

Next, let’s consider the Simple Storage Service, more commonly known as “S3.” S3 is an object-based storage system, meaning you can think of as a large key-value store. S3 organizes data into “buckets” with each bucket containing a set of keys. Using S3 you can store any file from 1 to 5 TBs as an object. For example, a file “myhdfdata.h5” could be stored to the bucket “mybucket” using the key “myhdf5data”. This could then be referenced as s3://mybucket/myhdfdata.h5 from anywhere in or outside of AWS. The bucket exists in a specific region, and data access will be fastest from that region, but in a pinch, you can also read and write to the bucket from a different region or from outside AWS all together.

S3 doesn’t really have the concept of directories, but you can use the slash (‘/”) character to create a simulacrum, e.g.: s3://mybucket/myfolder/myhdfdata.h5.

In contrast to EBS, S3 has many advantages:

data is globally accessible
you pay for only the data you store
lower cost than even HDD EBS volumes
data is replicated (so no danger of data being lost to a disk crash)
no limit to the number of objects or total size of the bucket

Despite these features, using S3 for HDF5 data can be a challenge because S3 is not a POSIX compliant filesystem. To use S3 with HDF5 files, you’ll need to either copy the file from S3 to an EBS volume (a hassle), or use a bit of software magic to make S3 look (more or less) like a POSIX file system.

To summarize, EBS will likely offer the easiest migration path for moving HDF5-based applications to the cloud. S3 offers a number of attractive features (cost savings not the least), but will require some changes in the application to utilize. Next, let’s look at some of the different ways we can use S3 with HDF5.

We’ll consider:

s3backer
ros3 VFD
s3fs
HSDS

s3backer

s3backer is an example of FUSE (File System in User Space). The s3backer software emulates a file system at the block level. To setup s3backer, you create a S3 Bucket and then “format” it using your desired block size and file system type (e.g. “ext4”). The filesystem is mounted and from that point on it will work just the same (as far as HDF5 is concerned) as with any POSIX filesystem. s3backer uses caching and lazy write techniques to minimize the latency effects of reading and writing to S3 and as a result can achieve fairly good performance.

If you peek at the contents of the S3 bucket after you’ve setup s3backer, you won’t see your files per se, but a set of keys corresponding to the block numbers in the file system (similar to what you’d see if you’d examine a physical drive with a low-level disk utility). S3 doesn’t support partial writes, but by keeping the block size reasonably small, s3backer can perform well with HDF5 application that read and write data. It will be worthwhile to experiment with different block sizes to see what works best with your particular application.

You can use s3backer from multiple machines, but it would not be advisable to have multiple write applications. There’s no coordination between the writers and the likely result is that your file will be corrupted.

ros3 VFD

ros3 is a VFD (“virtual file driver”) extension to the HDF5 library that enables HDF5 files stored on S3 to be open in read only mode. Unlike with s3backer, there’s no setup needed, you just copy your HDF5 file to S3 and then you can open in the HDF5 library by giving a http URI to the S3 object. Again, it’s read only, so you’ll need to create the file in a regular file system and then copy to S3 to use the ros3 VFD.

The ros3 VFD works at a low-level by substituting each POSIX read operation with a S3 range get request (a range get reads a specific byte range from the S3 object). Since latency in S3 is quite a bit higher than with EBS volumes, performance is not as good, but can vary a lot depending on your specific application and how the file is organized.

s3fs

s3fs is similar to ros3, but only works with Python applications. S3FS works with h5py by passing a “file like object” to the h5py.File class and (like the ros3 VFD) each read operation will get converted to a S3 range get request. Performance (in my testing at least), is similar to the ros3 VFD. Also like the ros3 VFD, s3fs only works with read-only applications.

HSDS

HSDS (Highly Scalable Data Service) can be used to provide access to HDF5 data in S3. If the data is stored in the HSDS schema, read-write access is supported. If HSDS is used to read HDF5 files stored in S3, only read access is supported. To work around S3 latency limitations, HSDS uses multiple processes (potentially on multiple machines) and asynchronous processing with a goal of supporting multiple in-flight S3 request. In contrast, ros3 or s3fs will have at most one s3 request at a time, so often the application will be waiting on S3.

HSDS supports multiple applications writing to the same file (MWMR). The HSDS services serves as a coordination point so that data is kept consistent. As the number of application increases, you can scale HSDS to provide more capacity (increasing the number of HSDS “nodes”).

FlexFS

Finally, you may want to take a look at the flexFS filesystem offered by Paradigm4. FlexFS is a service, but you work with it in a way similar to s3backer—by mounting a filesystem on your compute instance. Unlike with s3backer, you can have multiple instances connected to the same flexFS allocation. Just like with accessing HDF5 files over NFS, you’ll want to avoid trying to update the same file from multiple machines without using advisory file locking (e.g., flock) to coordinate.

FlexFS offers latency or throughput optimized allocations. I tested with the latency optimized version and performance was quite good—much faster than with EBS HDD drives (see benchmark results below). If you are interested in FlexFS, contact Paradigm4 for details on pricing.

Here’s a summary of the different options:

	read/write	External Access	Durable	Fixed Size	Cost	Performance
HDD	Y	N	N	Y	$$	**
SSD	Y	N	N	Y	$$$	****
FlexFS	Y	Y	Y	N	?	***
s3backer	Y	Y	Y	Y	$	**
s3fs	N	Y	Y	Y	$	*
ros3	N	Y	Y	N	$	*
HSDS	Y	Y	Y	N	$	**

Benchmarking

To illustrate the comparative performance of these different technologies, let’s look at a simple python-based benchmark, that does a hyperslab selection from a large two-dimensional dataset (17,568 columns by 2,018,392 rows): https://github.com/HDFGroup/hsds/blob/master/tests/perf/nrel/nsrdb/nsrdb_test.py. The test case will randomly choose the column (to avoid caching effects), and select data from the entire row. This selection touches approximately 5000 chunks, so effects of increased latency using S3 (or even HDD vs SSD) are relatively magnified.

All the tests were run from an m5.2xlarge instance type (4 core, 32GB ram) accessing data on a S3 bucket (or EBS volume) in the same region.

For the HSDS tests, HSDS was run in Docker on the same machine as the test script. Tests were run with HSDS on 1,2,4, and 8 nodes. Each test run accessed 11,000 MBs of data, so the test run time can be converted to a throughput in MiB/s by dividing the runtime by 11,000.

Results are:

Type	time (sec)	MiB/s
HDF5 HDD	59	186
HDF5 SSD	4	2750
HDF5 flexFS	17	647
HDF5 s3backer	113	97
HDF5 ros3	400	28
HDF5 s3fs	520	21
HSDS s3 1 node	109	101
HSDS s3 2 node	67	164
HSDS s3 4 node	42	262
HSDS s3 8 node	34	324

Conclusions

EBS Volumes with SSD provides that best absolute performance, but as we’ve discussed many other factors go into choosing a storage technology. The ros3 VFD and S3FS were the slowest options in this test case, but on the postive side are cost-effective and require no special setup to use.

HSDS performance with 4 or 8 nodes offered performance comparable to an HDD volume and all the benefits associated with S3 storage (here HSDS multiprocessing abilities roughly compensated with S3 increased latency to offer comparable performance to an EBS volume).

S3Backer and FlexFS also can provide performance equivalent to an EBS volume and are worth investigating.

In the end, given all the variables involved, nothing will replace spending some time trying out different options for your particular application. Time spent doing this upfront (rather than running into problems later on) will be well worthwhile.

Too many options? The HDF Group can provide help for supporting your application in the cloud. Feel free to contact help@hdfgroup.org to setup a consultation.