HDF5: Building on 25 Years of Success – SC23 BoF Session

Conference Program
Wednesday, 15 November 2023, 5:15pm-6:45pm MST
Location: 401-402

HDF5 is a critical I/O library for scientific applications. It has been 25 years since its first release in November, 1998. HDF5’s sustainability and adaptation to today’s computational and storage environment would not be possible without feedback and contributions from the HDF5 community. We will begin with a panel who will present case studies on how they use or would like to use HDF5 in current and emerging computational environments. We will then invite our community members to discuss the roadmap, how to contribute to HDF5, and what is required to sustain HDF5 for another 25 years.


Time slot (MST) Presenter Topic
17:15 – 17:25 Dana Robinson
(The HDF Group)
Introduction and HDF5 Roadmap
17:25 – 17:35 Jay Lofstead
(Sandia National Lab)
Fast, Searchable Data Annotations for Accelerating Time to Insight
17:35 – 17:45 Ravi Madduri
(Argonne National Lab)
Advanced Privacy preserving Federated Learning as a Service: Challenges and Opportunities
17:45 – 17:55 William Godoy
(Oak Ridge National Lab)
HDF5 as a critical component in the Julia HPC ecosystem
17:55 – 18:05 Johannes Blaschke
(Lawrence Berkeley Lab)
Perspectives from Data-Intensive HPC at NERSC
18:05 – 18:15 Donpaul Stevens
AirMettle Data Lake
18:15 – 18:25 Glenn Lockwood
I/O middleware for artificial intelligence: real intelligence required
18:25 – 18:45 Panel The next 25 years of HDF5

Long Description

HDF5 is a unique, open-source, high-performance technology suite that consists of an abstract data model, library, and file format used for storing and managing extremely large and/or complex data collections. It is used worldwide by government, industry, and academia in a wide range of science, engineering, and business disciplines.

The HDF5 community is both deep and broad: HDF5 is included by every major HPC system vendor as part of their core software, due to its broad adoption by science applications and its ability to improve I/O performance and data organization within HPC environments. Additionally, there are over 1000 projects on GitHub utilizing HDF5 due to its versatile, self-describing data model that can represent very complex data objects, relationships between the objects and objects’ metadata; portable binary file format with no limits on the number or size of data objects; software library optimized for efficient I/O; and tools for managing, manipulating, viewing, and analyzing HDF5 data.

The HDF5 community has continued adding features to access data in object and cloud storage, as well as exploit storage systems being deployed on today’s exascale systems. These features take advantage of the new storage paradigms and require minimum changes to current HDF5 applications. In the past decade, the amount of simulation, modeling, experimental, and observational data stored in HDF5 and the rate at which this data is collected have created new challenges for the scientists and triggered requests for using these new storage paradigms. Moreover, AI applications using HDF5 have requirements in reading data many times and shuffling data.

The HDF Group, The Ohio State University, Lawrence Berkeley Lab, Lifeboat , LLC, and Amazon AWS HPC teams have been working on enhancing HDF5 to address these challenges. We will present the latest HDF5 enhancements that will help applications run on exascale systems, exploit object storage, migrate to the cloud, and collect and store new types of data .We will demonstrate how the HDF5 virtual object layer (VOL) and virtual file driver (VFD) architectures now allow users to tackle scalable I/O on parallel file systems, data access on object store, asynchronous I/O and multi-threaded access to data, and more.

The target audience of this BoF includes numerous HDF5 users. A sample of them are: existing HDF5 users such as Exascale Computing Project (ECP) application developers and accelerator scientists, and new users such as the high-energy physics community who are exploring HDF5 as an alternative file format.

Our session format is focused on encouraging HDF5 community members to discuss challenges when using HDF5 and providing feedback to HDF5 developers. We will present a brief roadmap of HDF5, then invite current HDF5 users to share their experiences with the HDF5’s numerous features applied to real-world problems, and will solicit feedback on HDF5 improvements and gather requirements from the new users.


Quincey Koziol, Amazon AWS
Dana Robinson, The HDF Group
Suren Byna, The Ohio State University and Lawrence Berkeley National Laboratory
Elena Pourmal, Lifeboat, LLC