HSDS

If you’ve spent much time working with public repositories of HDF5 data, you’ll often see data organized as a large collection of files where the files are organized by time, geographic location or both. If you are using HSDS, there’s some good news in that you can use these collections as is and also have an aggregated view with HSDS....

When using HDF5 or HSDS you’ve likely benefited (even if you weren’t aware of it) caching features built into the software that can drastically improve performance. HSDS and h5pyd utilize caching to improve performance for service-based applications. In this post, we’ll do a quick review of how HDF5 library caching works and then dive into HSDS and h5pyd caching (with a brief discussion of web caching)....

Highly Scalable Data Service principal architect John Readey covers an update to the Highly Scalable Data Service. The max request size limit per HTTP request no longer applies with the latest HSDS update. In the new version large requests are streamed back to the client as the bytes are fetched from storage. Regardless of the size of the read request, the amount of memory used by the service is limited and clients will start to see bytes coming back while the server is still processing the tail chunks in the selection. The same applies for write operations—the service will fetch some bytes from the connection, update the storage, and fetch more bytes until the entire request is complete. Learn more about...

The Highly Scalable Data Service (HSDS) runs as a set of containers in Docker (or pods in Kubernetes) and like all things Docker, each container instance is created based on a container image file. Unlike say, a library binary, the container image includes all the dependent libraries needed for the container to run. In this blog post, HSDS senior architect John Readey explains how to get HSDS running in a Docker container or Kubernetes pod, and gives some tips and tricks to ensure everything runs smoothly for you. ...

Accessing large data stores over the internet can be rather slow, but often you can speed things up using multiprocessing—i.e. running multiple processes that divvy up the work needed. Even if you run more processes than you have cores on your computer, since much of the time each process will be waiting on data, in many cases you'll find things speed up nicely....