Agentic HDF5: Telemetry and Proactive Performance Advice – Matthew Larson – Call the Doctor (4/14/26) - The HDF Group - ensuring long-term access and usability of HDF data and supporting users of HDF technologies

In this session of “Call the Doctor,” Matt Larson provides a technical update on Agentic HDF5, a project designed to assist domain scientists in leveraging the power of HDF5 without needing to become experts in the library’s internal mechanics. The discussion focuses on a new telemetry system that proactively identifies performance bottlenecks and suggests optimizations to the user.

GitHub Repo: https://github.com/mattjala/agentic-hdf5
Discuss this session: https://forum.hdfgroup.org/t/13756
Previous Session on Agentic HDF5: https://youtu.be/_zkD-in_Pkg

Topics Covered

Agentic HDF5 Goals: Bridging the gap between domain science and HDF5 library expertise.
MCP Tools: Interface tools for reading, writing, re-chunking, and applying filters.
The Proactivity Gap: Why users often miss performance gains and how telemetry solves it.
Telemetry Architecture: How session logs are stored in HDF5 files to track access patterns.
Anti-Pattern Detection: Identifying redundant reads, unbatched writes, and suboptimal chunk sizes.
Future Directions: Integration with specialized sub-agents and provenance considerations.

Chapter List for this Video

0:00 Introduction
0:29 Overview of Agentic HDF5
4:36 The Proactivity Gap and Telemetry Solution
5:42 Architecture: Interaction between Agent and Telemetry File
6:55 Specific Performance Checks: Chunk Sizes, Repeated Reads/Writes
8:50 Current Development Status and Future Directions
10:45 Q&A: Agent Access to End-User Code
12:35 Wrap-up

Summary

Bridging the Expertise Gap with Agents

Agentic HDF5 leverages the capabilities of modern AI models (currently focusing on Claude and Claude Code) to act as an intelligent intermediary between the scientist and their data. By packaging HDF5 expertise—such as virtual datasets, SWMR, and access pattern best practices—into specific “skills,” the agent can intelligently ingest or create files while the user focuses on their scientific research.

Proactive Optimization via Telemetry

A primary challenge in data management is that users are often unaware when their access patterns are suboptimal. To address this, Matthew introduced a new telemetry and advisory system that is currently in active development:

Session Logging: All tool uses (reading, writing, byte counts, and timing) are logged to a dedicated HDF5 telemetry file corresponding to the data file.
Anti-Pattern Detection: The system scans these logs to identify repeated reads from the same chunks or inefficient access patterns that could be addressed by HDF5’s built-in knowledge.
Intelligent Advisories: The agent proactively brings these potential improvements to the user’s attention, suggesting solutions like batching writes or re-chunking files without requiring explicit user intervention.

Future Directions: Sub-Agents and Code Integration

During the session, the discussion touched on the future of the advisory system, including the potential for:

Expert Sub-Agents: Spawning specialized agents to address specific library regions or bespoke data practices.
Provenance: Evaluating whether telemetry should be bundled within the data file itself for better portability.
Code Analysis: Expanding the agent’s scope to analyze the user’s scripts alongside telemetry to suggest more intelligent code rewrites.

Transcript

[0:00] Matthew Larson: All right everyone, welcome to Call the Doctor. Let me share my screen here. I have a few short slides. Today I’m going to talk about Agentic HDF5, a product I’ve been working on for a few months now, and particularly some new features relating to telemetry—how the agent tries to advise the user and bring up potential improvements to how they’re using HDF5.

[0:29] Matthew Larson: To start with, the high-level goal of the project is to enable domain scientists, and generally anyone who wants to make use of data in HDF5 files, but isn’t interested in spending the time to become an HDF5 expert. They might not be interested in all the details of file formats and nitty-gritty library stuff; they just want to use the data and do science.

Agentic HDF5’s goal is to leverage agents—the modern state of models and their abilities—to bridge that gap between the user and the data on disk or in the cloud. Right now it’s limited to Claude and Claude Code, but eventually, we intend to expand this to make it more model and provider agnostic. Within that framework, we have a set of skills containing expertise on various advanced HDF5 features, things like virtual datasets, SWMR, and access pattern best practices.

[1:54] Matthew Larson: Agentic HDF5 also includes a set of MCP tools for interfacing with files. You have basic reading and writing, but also more involved things like tools to re-chunk the file or apply filters to a dataset to improve access speeds or reduce file size. These are the kinds of things that laypersons might not know how to do unless they are informed via something like this agent.

[3:10] Matthew Larson: One problem you might notice is that it’s not really very proactive as it stands. If the user just tells the agent to read a file, it’s not going to go out of its way to say, “Hold on, these chunks could be better.” Users often think “if it isn’t broken, don’t fix it,” or they don’t know it could be better.

[4:11] Matthew Larson: The solution I’ve been working with is to go through a telemetry file. Over the course of a session, all tool uses—reading and writing to HDF5 files—are logged to telemetry files. Later tool uses will scan this telemetry and, according to pre-programmed logic, identify repeated reads or bad access patterns.

[5:42] Matthew Larson: Here is a rough sketch of the architecture. We have the end-user interfacing with the agent through the chat interface, and the agent is interacting with the HDF5 file. When the agent does a read or a write, it adds a new entry to this telemetry file recording the dataset targeted, the chunks accessed, and how long it took.

[6:55] Matthew Larson: We can notice certain specific things programmatically. For example, are we reading from the same set of chunks a lot? If so, we should tell the user to change their workflow to read once and do all relevant operations. Similarly, if they are repeating writes, we can prod them to batch the writes. We can also check chunk sizes; defaults are not necessarily helpful. If chunks are too small, metadata becomes problematic. The agent can suggest resizing these chunks fairly intelligently.

[8:50] Matthew Larson: Right now, this is still in active development. I hope to get it committed to the master branch this week. One thing to consider is whether it makes sense to have this telemetry be external or to take a provenance approach and bundle it in the file itself. There are pros and cons—portability versus increased file size.

[10:21] Robert Seip: Matt, I have a question. If you go back to your high-level diagram, at any point do you see the agent as having access to the end-user’s code? In other words, would the agent, in conjunction with the telemetry file, be able to read the user’s code and suggest rewrites?

[10:45] Matthew Larson: Usually, it would have access to the code. The archetypical workflow is that you are in the directory for your project where you have your scripts. You can choose to give the agent access to those files. I unusually imagine you would want to, because then it could think more intelligently. Right now, the telemetry is limited to direct tool calls, but I definitely think it’s a good idea to eventually expand this to cover all modifications of the file.

[12:35] Matthew Larson: If nobody else has any questions or comments, maybe I’ll wrap up. Thank you for your time.