HDF® Cloud

HDF5® is now ready for the cloud

The HDF Cloud platform addresses the challenges of adapting large scale array-based computing to the cloud and object storage while intelligently handling the full data management life cycle. Organizations that have existing HDF5-based applications running POSIX file storage platforms can migrate their HDF5 data to a cloud provider running on object storage without needing to rewrite their application or sacrifice on performance or scalability. In addition, organizations are able to publish HDF5-based data to a cloud-based system enabling broad data access by users both internal to their organization and/or external users including the general public.

Advantages

  • Leverage your existing investments in HDF5-based applications
  • Retain the key features and benefits of HDF5 in a cloud environment—HDF5 optimized for in the cloud
  • Enjoy the additional processing and storage benefits using Amazon Web Services including the built-in redundancy and the scalability of the cloud
  • Take advantage of the cost-effective, pay-as-you-go pricing structure
  • Elastically scale compute with usage by running as a cluster of Docker containers
  • Leverage Python using asyncio to implement task oriented parallelism

Key Technical Features

  • Clients can interact with service using REST API
  • SDKs provide language specific interface (e.g. h5pyd for Python)
  • Ability to read/write only the data that is needed (as opposed to transferring entire files)
  • No limit to the amount of data that can be stored by the service
  • Multiple clients can read/write to same data source
  • Scalable performance: cache recently accessed data in RAM, parallelize requests across multiple nodes, and utilize more nodes leads to better performance 

HDF Cloud S3 Schema


Map individual HDF5 objects (datasets, groups, chunks) as objects in Object Storage to store HDF5 content in S3:

  • Limit maximum storage object size
  • Support parallelism for read/write
  • Only data that is modified needs to be updated

Each chunk (heavy outlines) get persisted as a separate object:

  • Dataset is partitioned into chunks
  • Each chunk stored as an S3 object
  • Dataset meta data (type, shape, attributes, etc.) stored in a separate object (as JSON text)

Architecture

The HDF Group is excited to launch HDF Cloud in 2018. We’re currently beta testing HDF Cloud and are looking for additional beta testers. If you’re interested in further information, senior developer John Readey put on this recorded webinar on HDF Cloud. If you have any questions, or just want to stay updated on this new product, let us know.

This material is based upon work supported by NASA under award Number NNX16AL91A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration.