John Readey Tag

If you’ve spent much time working with public repositories of HDF5 data, you’ll often see data organized as a large collection of files where the files are organized by time, geographic location or both. If you are using HSDS, there’s some good news in that you can use these collections as is and also have an aggregated view with HSDS....

John Readey, The HDF Group Before there was HSDS, there was h5serv. Released in 2015, h5serv was the first implementation of the HDF Rest API. Designed mainly as a way to demonstrate the RESTful interface for HDF, h5serv had a fairly simple implementation: A single threaded application that on receiving an HTTP request, made the equivalent HDF5 library call and converted the result to a JSON response which was returned to the client. Though useful for some applications, in the context of building a scalable web service, there were limitations with this approach. Since there was only one process in the h5serv application, each HTTP request had to be completely processed before handling the next one. This made it quite easy to...

When using HDF5 or HSDS you’ve likely benefited (even if you weren’t aware of it) caching features built into the software that can drastically improve performance. HSDS and h5pyd utilize caching to improve performance for service-based applications. In this post, we’ll do a quick review of how HDF5 library caching works and then dive into HSDS and h5pyd caching (with a brief discussion of web caching)....

Highly Scalable Data Service principal architect John Readey covers an update to the Highly Scalable Data Service. The max request size limit per HTTP request no longer applies with the latest HSDS update. In the new version large requests are streamed back to the client as the bytes are fetched from storage. Regardless of the size of the read request, the amount of memory used by the service is limited and clients will start to see bytes coming back while the server is still processing the tail chunks in the selection. The same applies for write operations—the service will fetch some bytes from the connection, update the storage, and fetch more bytes until the entire request is complete. Learn more about...

If you are looking to store HDF5 data in the cloud there are several different technologies that can be used and choosing between them can be somewhat confusing. In this post, I thought it would be helpful to cover some of the options with the hope of helping HDF users make the best decision for their deployment. Each project will have its own requirements and special considerations, so please take this as just a starting point....

The Highly Scalable Data Service (HSDS) runs as a set of containers in Docker (or pods in Kubernetes) and like all things Docker, each container instance is created based on a container image file. Unlike say, a library binary, the container image includes all the dependent libraries needed for the container to run. In this blog post, HSDS senior architect John Readey explains how to get HSDS running in a Docker container or Kubernetes pod, and gives some tips and tricks to ensure everything runs smoothly for you. ...

Accessing large data stores over the internet can be rather slow, but often you can speed things up using multiprocessing—i.e. running multiple processes that divvy up the work needed. Even if you run more processes than you have cores on your computer, since much of the time each process will be waiting on data, in many cases you'll find things speed up nicely....

HSDS (Highly Scalable Data Service) is a REST based service for HDF data, part of HDF Cloud, our set of solutions for cloud deployments. In the recent blog about the latest release of HSDS we discussed many of the new features in the 0.6 release including support for Azure....

  • 1