Technical Insights

Interim Engineering Director Dana Robinson talks about The HDF Group's upcoming release schedule. As some of you may already have noticed, we now post the current HDF5 release schedule in the README.md document in the project's root on GitHub. I'll update this as circumstances change so it always reflects our current thinking....

The HDF Group just released HDF5 1.13.1. All of the 1.13 series are experimental releases, which allows us to test new features with our users and get feedback while we are working on the development of the next major maintenance release. Learn more about this new release of HDF5....

The purpose of this introduction is to highlight and celebrate a community contribution the impact of which we are just beginning to understand. Its principal author, Mr. Lucas C. Villa Real, calls it HDF5-UDF and describes it as "a mechanism to generate HDF5 dataset values on-the-fly using user-defined functions (UDFs)." This matter- of-fact characterization is quite accurate, but I would like to provide some context for what this means for us users of HDF5....

Hermes is a distributed I/O buffering system for deep distributed storage hierarchies, which are commonly found on modern HPC systems. On December 1st, 2021, members of the Hermes team gave a presentation to show Hermes in action, talk about the new release and plan for future development. Here are the materials from that session: Slide Deck - Part I - Intro, Overview, and FAQ Slide Deck - Part II - Demonstration https://youtu.be/zDmUdynklJs    ...

On February 12, 2021, we were pleased to host Lucas Villa Real of IBM Research to discuss his project HDF5-UDF, a data virtualization tool for HDF5. The tool enables users to associate logic in source code form (i.e., in user-defined functions, written in Python, C/C++, or Lua) with HDF5 datasets. Such UDFs are compiled into a binary form (which often takes no more than a few KB) and embedded into HDF5; once an application reads such a dataset, HDF5-UDF executes that binary code and generates the data on-the-fly. Lucas has just released HDF5-UDF 1.2 which offers several new features: among other benefits, it makes it possible to easily virtualize CSV files so they look like regular HDF5 datasets. Attached you'll find the slide deck...

The HDF Group’s technical mission is to provide rapid, easy and permanent access to complex data. FishEye's vision is "Synthesizing the world’s real-time data". This white paper is intended for embedded system users, software engineers, integrators, and testers that use or want to use HDF5 to access, collect, use and analyze machine data. FishEye has developed an innovative process that provides the most efficient method to expose data from embedded systems that simplifies and liberates data for real-time analysis, machine learning, and cloud-enabled services....

HSDS (Highly Scalable Data Service) is a REST-based service for reading and writing HDF data. Initially developed as a NASA Access 2015 project, the HDF Group has continued to invest in the project, and as we'll see, the latest version has a bevy of new and interesting features....

We are pleased to post this white paper from The HDF Group intern, Chen Wang. This paper looks at the steps of analyzing and tuning the HACC-IO benchmarks, the impact of different access patterns, stripe settings and HDF5 metadata. It also compares the five benchmarks on two different parallel file systems, Lustre and GPFS and shows that HDF5 with proper optimizations can catch up the pure MPI-IO implementations. An I/O Study of ECP Applications...