John Readey Archives - The HDF Group - ensuring long-term access and usability of HDF data and supporting users of HDF technologies

Aggregation for Cloud Storage

December 12, 2022

If you’ve spent much time working with public repositories of HDF5 data, you’ll often see data organized as a large collection of files where the files are organized by time, geographic location or both. If you are using HSDS, there’s some good news in that you can use these collections as is and also have an aggregated view with HSDS.

HDF Cloud News – 11-28-22

November 28, 2022

News on the H5PYD v0.12.0 release and an install guide for running HSDS on Tencent Cloud.

Sunset for h5serv

October 31, 2022

John Readey, The HDF Group Before there was HSDS, there was h5serv. Released in 2015, h5serv was the first implementation of the HDF Rest API. Designed mainly as a way to demonstrate the RESTful interface for HDF, h5serv had a fairly simple implementation: A single threaded application that on receiving an HTTP request, made the

Improve HDF5 performance using caching

October 17, 2022

When using HDF5 or HSDS you’ve likely benefited (even if you weren’t aware of it) caching features built into the software that can drastically improve performance. HSDS and h5pyd utilize caching to improve performance for service-based applications. In this post, we’ll do a quick review of how HDF5 library caching works and then dive into HSDS and h5pyd caching (with a brief discussion of web caching).

HSDS Streaming

August 22, 2022

Highly Scalable Data Service principal architect John Readey covers an update to the Highly Scalable Data Service. The max request size limit per HTTP request no longer applies with the latest HSDS update. In the new version large requests are streamed back to the client as the bytes are fetched from storage. Regardless of the size of the read request, the amount of memory used by the service is limited and clients will start to see bytes coming back while the server is still processing the tail chunks in the selection. The same applies for write operations—the service will fetch some bytes from the connection, update the storage, and fetch more bytes until the entire request is complete. Learn more about this update, plus check out John’s benchmark results using a couple of different MacBook Pros and his new DevOne laptop.

Cloud Storage Options for HDF5

August 8, 2022

If you are looking to store HDF5 data in the cloud there are several different technologies that can be used and choosing between them can be somewhat confusing. In this post, I thought it would be helpful to cover some of the options with the hope of helping HDF users make the best decision for their deployment. Each project will have its own requirements and special considerations, so please take this as just a starting point.

HSDS Docker Images

July 25, 2022

The Highly Scalable Data Service (HSDS) runs as a set of containers in Docker (or pods in Kubernetes) and like all things Docker, each container instance is created based on a container image file. Unlike say, a library binary, the container image includes all the dependent libraries needed for the container to run. In this blog post, HSDS senior architect John Readey explains how to get HSDS running in a Docker container or Kubernetes pod, and gives some tips and tricks to ensure everything runs smoothly for you.

Deep Dive: HSDS Container Types

July 11, 2022

HSDS (Highly Scalable Data Service) is described as a “containerized” service, but how are these containers organized to create the service?

Speed up cloud access using multiprocessing!

June 27, 2022

Accessing large data stores over the internet can be rather slow, but often you can speed things up using multiprocessing—i.e. running multiple processes that divvy up the work needed. Even if you run more processes than you have cores on your computer, since much of the time each process will be waiting on data, in many cases you’ll find things speed up nicely.