String-Based Filter Configuration with TOML – Scot Breitenfeld – Call the Doctor (5/12/26)

In this session of “Call the Doctor,” The HDF Group’s Scot Breitenfeld discusses a major modernization effort for HDF5: moving from opaque integer-based filter parameters to human-readable TOML strings. This update, targeted for the HDF5 2.2 release, simplifies the developer experience for plugin authors and users of high-level languages like Python, Java, and Fortran.

Configuring HDF5 filters has historically required “type-punning” complex parameters (like floating-point rates for ZFP) into an array of unsigned integers known as cd_values. This method is opaque, error-prone, and difficult to use from the command line. Scot Breitenfeld presents a new approach utilizing TOML (Tom’s Obvious, Minimal Language) as the standard for filter configuration strings.

By leveraging TOML, HDF5 gains native support for integers, floats, booleans, and nested tables without the need for the HDF Group to maintain a custom grammar. While the underlying on-disk storage remains identical for backward compatibility, the user-facing API becomes significantly more intuitive. The proposal includes a new H5Pappend_filter function and updates to the plugin class to support human-readable introspection.

Relevant Links

Topics Covered

  • The Problem with Custom Parsers: Why building a bespoke grammar for filter strings is a maintenance burden.
  • Why TOML? The advantages of using a standardized configuration language (native types, nesting, and MIT-licensed parsers).
  • API Updates: Introducing H5Pappend_filter and new getter functions for human-readable parameters.
  • Backward Compatibility: How the library maintains on-disk compatibility while enabling modern features.
  • Language Bindings: Implementation details for h5py, Fortran, and Java.

Chapter List

0:00 Introduction & Context
0:55 The Problem with Custom Parsers
3:57
Why TOML? Native Types & Nesting
5:38
Configuration Examples (ZFP)
7:23
Support for Python, Fortran, and Java
7:58
Plugin Author Updates (H5Z_class3_t)
9:00 Backward Compatibility & On-Disk Format
10:45
Q&A: Precision & Introspection
15:04
RFC Synopsis: Solving Opaque Parameters
22:20 Updates to h5repack and h5dump
27:16 Release Timeline (HDF5 2.2)
29:29 Wrap-up

Transcript

  • [00:00] Thank you for joining us. This is a follow-up to our working group meeting last Thursday. I’ll start by going over the discussion about the parser—specifically, why we are switching from what was presented at that meeting. Secondly, I was asked to give a synopsis of the original RFC for the broader community.
  • [00:55] When we first looked into a parser for filter strings, we considered a custom key-value parser. However, that meant we would have to develop our own grammar and handle every edge case—quoting semantics, escaping, and boolean flags. Supporting a custom spec forever is a significant burden.
  • [02:14] One problem with the original approach was that everything was a string; you didn’t get native types. You had to manually filter a string into a double or a boolean.
  • [04:13] We found that TOML is the closest to what we were developing but is already a widely used configuration language. It gives us native types, defined escaping rules, and support for nesting for free.
  • [05:56] For example, it handles comma separations and nested tables. We also maintain a “filter title” keyword, allowing you to store a readable name (like “ZFP”) directly in the string.
  • [07:58] For plugin authors, we’ve added new functions like H5Pget_config_int or double. Instead of manual bit-casting, you just request the parameter you need by name.
  • [09:00] Importantly, the on-disk format has not changed. The “CD values” are still what gets stored. TOML is simply the input format to make the library easier to use.
  • [10:57] Community Member: What do filters need to do to return additional info, like a ZFP rate? Scot: If you use the new class, h5dump can return that information in a readable format using callback functions to interpret those values back into strings.
  • [15:56] Currently, packing values for ZFP doesn’t mean much to users. We’re introducing this plain-text method to make introspection easier.
  • [22:20] Our command-line tools like h5repack and h5dump will be updated to accept these TOML strings, so you don’t have to use “magic numbers” on the command line.
  • [27:22] We are aiming for this to be in the HDF5 2.2 release at the end of July 2026.

Leave a Comment

Your email address will not be published. Required fields are marked *


Scroll to Top