In this session of “Call the Doctor,” The HDF Group’s Scot Breitenfeld discusses a major modernization effort for HDF5: moving from opaque integer-based filter parameters to human-readable TOML strings. This update, targeted for the HDF5 2.2 release, simplifies the developer experience for plugin authors and users of high-level languages like Python, Java, and Fortran.
Configuring HDF5 filters has historically required “type-punning” complex parameters (like floating-point rates for ZFP) into an array of unsigned integers known as cd_values. This method is opaque, error-prone, and difficult to use from the command line. Scot Breitenfeld presents a new approach utilizing TOML (Tom’s Obvious, Minimal Language) as the standard for filter configuration strings.
By leveraging TOML, HDF5 gains native support for integers, floats, booleans, and nested tables without the need for the HDF Group to maintain a custom grammar. While the underlying on-disk storage remains identical for backward compatibility, the user-facing API becomes significantly more intuitive. The proposal includes a new H5Pappend_filter function and updates to the plugin class to support human-readable introspection.
Relevant Links
- RFC-HDFG-2026-001: String-Based Filter Configuration API
- GitHub Discussion: Modernizing Filter Parameters Issue
Topics Covered
- The Problem with Custom Parsers: Why building a bespoke grammar for filter strings is a maintenance burden.
- Why TOML? The advantages of using a standardized configuration language (native types, nesting, and MIT-licensed parsers).
- API Updates: Introducing
H5Pappend_filterand new getter functions for human-readable parameters. - Backward Compatibility: How the library maintains on-disk compatibility while enabling modern features.
- Language Bindings: Implementation details for h5py, Fortran, and Java.
Chapter List
0:00 Introduction & Context
0:55 The Problem with Custom Parsers
3:57 Why TOML? Native Types & Nesting
5:38 Configuration Examples (ZFP)
7:23 Support for Python, Fortran, and Java
7:58 Plugin Author Updates (H5Z_class3_t)
9:00 Backward Compatibility & On-Disk Format
10:45 Q&A: Precision & Introspection
15:04 RFC Synopsis: Solving Opaque Parameters
22:20 Updates to h5repack and h5dump
27:16 Release Timeline (HDF5 2.2)
29:29 Wrap-up
Transcript
- [00:00] Thank you for joining us. This is a follow-up to our working group meeting last Thursday. I’ll start by going over the discussion about the parser—specifically, why we are switching from what was presented at that meeting. Secondly, I was asked to give a synopsis of the original RFC for the broader community.
- [00:55] When we first looked into a parser for filter strings, we considered a custom key-value parser. However, that meant we would have to develop our own grammar and handle every edge case—quoting semantics, escaping, and boolean flags. Supporting a custom spec forever is a significant burden.
- [02:14] One problem with the original approach was that everything was a string; you didn’t get native types. You had to manually filter a string into a double or a boolean.
- [04:13] We found that TOML is the closest to what we were developing but is already a widely used configuration language. It gives us native types, defined escaping rules, and support for nesting for free.
- [05:56] For example, it handles comma separations and nested tables. We also maintain a “filter title” keyword, allowing you to store a readable name (like “ZFP”) directly in the string.
- [07:58] For plugin authors, we’ve added new functions like
H5Pget_config_intordouble. Instead of manual bit-casting, you just request the parameter you need by name. - [09:00] Importantly, the on-disk format has not changed. The “CD values” are still what gets stored. TOML is simply the input format to make the library easier to use.
- [10:57] Community Member: What do filters need to do to return additional info, like a ZFP rate? Scot: If you use the new class,
h5dumpcan return that information in a readable format using callback functions to interpret those values back into strings. - [15:56] Currently, packing values for ZFP doesn’t mean much to users. We’re introducing this plain-text method to make introspection easier.
- [22:20] Our command-line tools like
h5repackandh5dumpwill be updated to accept these TOML strings, so you don’t have to use “magic numbers” on the command line. - [27:22] We are aiming for this to be in the HDF5 2.2 release at the end of July 2026.