Suren Byna, Elena Pourmal, Lori Cooper
Overview
HDF5 has been a widely used tool to simplify management and access to scientific and engineering data with ubiquitous data solutions. With rapidly growing data across all domains of science and industry, HDF5 developers have been building technologies that provide rapid, easy, and permanent access to complex data. The HDF Group, a non-profit organization, has been the driving force behind developing and maintaining the HDF5 software library for more than two decades.
HDF5 has been extremely successful with a wide range of users and an ecosystem built around the HDF5 library. With the goal of facilitating a broader discussion among HDF5 users, The HDF Group and Lawrence Berkeley National Laboratory (LBNL) teamed up to host a virtual HDF5 User Group (HUG) Meeting from October 13th to October 16th, 2020. Despite the meeting being virtual with challenging schedules for attendees from around the world, the attendance was consistently around 60 throughout the 4 days of the meeting. Plans are underway for the 2021 HDF5 User Group meeting on October 12-15, 2021–you’ll find a link to the Call for Papers at the bottom of this post.
HUG 2020 Program
Tutorial
The first day of HUG 2020 featured a tutorial that introduced a diverse set of tools and techniques when working with HDF5 data and applications. Gerd Heber, Aleksandar Jelenak, and John Readey from the HDF Group (THG) gave demonstrations of technologies (rhdf5, HDFql, using Python locally and in the cloud). Steven Varga, an independent researcher from Varga Consulting, presented H5CPP that provides a header-only library for using HDF5 with C++. Scot Breitenfeld from THG and Quincey Koziol from LBNL presented various tuning knobs for achieving superior performance with parallel HDF5. Gerd Heber then talked about “what could go wrong” with HDF5 performance using 24 combinations of writing the same data in different modes and how to troubleshoot for achieving good performance.
The remaining three days of HUG 2020 focused on various features of HDF5, applications, and tools in the HDF5 ecosystem. The topics covered included:
- Updates on THG and HDF5
- HDF5 for Exascale and HPC
- HDF5 VOL connectors and VFDs
- HDF5 use cases in sciences
- HDF5 language bindings
- HDF5 Cloud solutions
- HDF5 Industry use cases
Business model and HDF5 roadmap:
Mike Folk, the Interim Executive Director of The HDF Group, presented the business model of HDF5 and Elena Pourmal, the Director of Technical Services and Operations, presented the HDF5 roadmap. Elena’s talk included information on upcoming releases, new features under development, and the steps The HDF Group has taken to revamp HDF5 as a community-driven Open Source project.
HDF5 Features
HDF5 software has been enhanced with several new features lately. Quincey Koziol from Lawrence Berkeley Laboratory and Chris Hogan from THG discussed ongoing efforts to allow fully concurrent execution of all HDF5 API routines from multiple threads. Quincey and Suren presented various features targeting support of managing experimental and observational data. Jon Readey presented “HDF for Cloud – HSDS server”, an effort to support HDF5 in the cloud, followed by two applications of HSDS – FIREFLY, SlideRule, and Open Energy Data Initiative (OEDI). Loic Huder presented h5web, a web-based viewer of HDF5 files. Virtual File Drivers (VFDs) is an abstraction layer through which it performs I/O on a file. Exascale Computing Project (ECP) is supporting development of a GPU I/O VFD that uses NVIDIA’s GPU Direct Storage (GDS). The Mirror VFD that allows mirroring of an HDF5 file on a remote system as it is being created on a local file system and Onion VFD to access previous versions of an HDF5 file on a per open/close cycle basis are presented along with a new implementation of Single Writer Multiple Reader (SWMR) has also been implemented as a VFD (New VFDs and SWMR Redesign). The day finished with a round table discussion with active participation of attendees.
Applications
On Thursday, HUG 2020 focused on applications that use HDF5 from research and industry uses including an X-ray research laser, tornadic thunderstorm simulations, pharmaceuticals, environmental science, machine learning, real-time analysis, and deep learning. Quincey Koziol provided an introduction to VOLS, the virtual object layer that redirects I/O operations into a VOL “connector” which can be used to extend HDF5 to add new capabilities or storage mechanisms. The day’s proceedings ended with talks about some new VOL connectors and the new features they bring to HDF5.
Ecosystem
For the final day, the meeting took a deep dive into the HDF Ecosystem. This involved various software packages and tools like h5py, HDF5-UDF, FasTensor,DREAM.3D, MACSio, Neurodata Without Borders, and Darshan to name a few and looked at use cases at Sandia National Lab and ITER.
Concluding remarks
The organizing committee of HUG 2020 included Lori Cooper, Elena Pourmal, and Dax Rodriguez from THG, Suren Byna and Quincey Koziol from LBNL. The presentation slides and the YouTube videos of HUG 2020 are available in one place at https://www.hdfgroup.org/hug/2020-hug/hdf5-users-group-2020-agenda/.
With all the enthusiasm in the HDF5 community, THG and LBNL are teaming up again to host HUG 2021. This edition of HUG is inviting papers describing R&D in HDF5 topics, applications, etc. and abstracts for presentations. The full Call for Papers and Presentations is online and you can submit your proposal at https://easychair.org/conferences/?conf=hug21.