News

The HDF5 Subfiling Virtual File Driver (VFD) is an MPI-based File Driver that was introduced in the HDF5 1.14.0 release. This File Driver allows an HDF5 application to create HDF5 files which are distributed across a collection of ”subfiles” in equal-sized data segment ”stripes.” I/O to the logical HDF5 file is directed to the appropriate ”subfile” according to the Subfiling VFD configuration and a system of ”I/O Concentrators.” For more information on the general design of this VFD, refer to the Subfiling RFC. A user guide for the Subfiling Virtual File Driver was created in conjunction with the ExaIO team. The document gives a brief overview of the Subfiling VFD and details how to build and use the VFD, how to get the...

We are happy to announce the release of HDF5 1.14.0, which can now be obtained from the HDF5 Download page. More information about this release can be found on the HDF5 1.14.0 release page. For scheduled future releases, please refer to the release schedule....

Dana Robinson has been appointed as the new Director of Engineering at The HDF Group. Dana started at The HDF Group in 2009 as a software engineer until stepping into the role of interim Director of Engineering in April 2022. As the Director of Engineering, Dana will lead the team of software engineers and shape the engineering culture, including developing the roadmap for HDF5 releases. Externally, he will engage at all levels in research, development, consulting, opportunity analysis, and community outreach. A skilled software engineer and scientific data architect with over twenty-five years of experience across a wide variety of disciplines, Dana started his career at The HDF Group working on a bioinformatics project, then moved to library development. He’s worked on projects including...

We are excited to announce the appointment of Neil Fortner as the new Chief HDF5 Software Architect. Neil has worked for The HDF Group as a software engineer since 2008. While at The HDF Group, he focused his talents in storage and HPC on improving performance, expanding the features, and improving the maintainability of the HDF5 library. Neil has a proven history of quickly diving into and improving large, complex code bases. As the Chief HDF5 Software Architect, Neil will develop detailed architectural designs for software solutions internally and for our consulting clients. He will be a mentor within the engineering team as we continue to grow. Neil received his bachelor's in Aerospace Engineering from the University of Maryland. In his spare...

Highly Scalable Data Service principal architect John Readey covers an update to the Highly Scalable Data Service. The max request size limit per HTTP request no longer applies with the latest HSDS update. In the new version large requests are streamed back to the client as the bytes are fetched from storage. Regardless of the size of the read request, the amount of memory used by the service is limited and clients will start to see bytes coming back while the server is still processing the tail chunks in the selection. The same applies for write operations—the service will fetch some bytes from the connection, update the storage, and fetch more bytes until the entire request is complete. Learn more about...

The Highly Scalable Data Service (HSDS) runs as a set of containers in Docker (or pods in Kubernetes) and like all things Docker, each container instance is created based on a container image file. Unlike say, a library binary, the container image includes all the dependent libraries needed for the container to run. In this blog post, HSDS senior architect John Readey explains how to get HSDS running in a Docker container or Kubernetes pod, and gives some tips and tricks to ensure everything runs smoothly for you. ...

M. Scot Breitenfeld, HDF application support specialist and software engineer at The HDF Group, will present a session, Introduction to HDF5 for HPC Data Models, Analysis, and Performance on July 27, 2022. Scot's talk offers a comprehensive overview of HDF5 for anyone who works with big data in an HPC environment. The talk consists of two parts. Part I introduces the HDF5 data model and APIs for organizing data and performing I/O. Part II focuses on HDF5 advanced features such as parallel I/O and will give an overview of various parallel HDF5 tuning techniques such as collective metadata I/O, data aggregation, async, parallel compression, and other new HDF5 features that help to utilize HPC storage to its fullest potential. Please register to attend Scot's...

Accessing large data stores over the internet can be rather slow, but often you can speed things up using multiprocessing—i.e. running multiple processes that divvy up the work needed. Even if you run more processes than you have cores on your computer, since much of the time each process will be waiting on data, in many cases you'll find things speed up nicely....