HDF5® in the Cloud

The Kita platform addresses the challenges of adapting large scale array-based computing to the cloud and object storage while intelligently handling the full data management life cycle. Organizations that have existing HDF5-based applications running POSIX file storage platforms can migrate their HDF5 data to a cloud provider running on object storage without needing to rewrite their application or sacrifice on performance or scalability. In addition, organizations can publish HDF5-based data to a cloud-based system enabling broad data access by users both internal to their organization and/or external users including the general public.

Kita Lab

JupyterLabs enabled data exploration, fully hosted by the HDF Group. Learn more with a 30-day free trial.

Kita Server for AWS Marketplace

Access from anywhere using Amazon Machine Image (AMI) for AWS.

Kita Server for On Premise

Install and access Kita on your existing infrastructure

Interested in learning more about Kita? Contact the HDF Group today to see which solution is right for you.

Advantages

  • Leverage your existing investments in HDF5-based applications
  • Retain the key features and benefits of HDF5 in a cloud environment—HDF5 optimized for in the cloud
  • Enjoy the additional processing and storage benefits using Amazon Web Services including the built-in redundancy and the scalability of the cloud
  • Take advantage of the cost-effective, pay-as-you-go pricing structure
  • Elastically scale compute with usage by running as a cluster of Docker containers
  • Leverage Python using asyncio to implement task-oriented parallelism

Technical Details

  • Clients can interact with service using REST API
  • SDKs provide language specific interface (e.g. h5pyd for Python)
  • Ability to read/write only the data that is needed (as opposed to transferring entire files)
  • No limit to the amount of data that can be stored by the service
  • Multiple clients can read/write to same data source
  • Scalable performance: cache recently accessed data in RAM, parallelize requests across multiple nodes, and utilize more nodes leads to better performance