H5Glance: Explore HDF5 files in a terminal or a notebook

Thomas Kluyver, European XFEL

This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 823852.

Exploring with h5py

I'll be introducing h5py properly in tomorrow's session.

In [1]:
import h5py
In [2]:
f = h5py.File('sample.h5', 'r')
f.keys()
Out[2]:
<KeysViewHDF5 ['group', 'links']>
In [3]:
f['group']
Out[3]:
<HDF5 group "/group" (4 members)>
In [4]:
f['group'].keys()
Out[4]:
<KeysViewHDF5 ['complex', 'custom_float', 'integers', 'strings']>

This works, but it's not very convenient to inspect a file.

There are applications like HDFview, Vitables & h5web, but that means leaving your notebook context and opening the same file in a separate application. H5Glance is designed to fit into the notebook (or terminal) context.


Exploring with h5glance in a notebook

https://github.com/European-XFEL/h5glance

In [5]:
from h5glance import H5Glance
In [6]:
H5Glance(f)
Out[6]:
      • complex [📋]: 1024 × 1024 entries, dtype: (r: float64, i: float64)
      • custom_float [📋]: 4 × 7 entries, dtype: custom 2-byte float
      • integers [📋]: 10 × 6 × 5 entries, dtype: int64
      • strings [📋]: 4 entries, dtype: UTF-8 string
      • external → fxe_control_example.h5//INDEX/trainId
      • hard [📋]: 10 × 6 × 5 entries, dtype: int64
      • soft → /group/integers

The whole structure is created up-front - this may be slower, but it just works in a static HTML export of the notebook, including on Nbviewer.

H5Glance doesn't show your data - it's easy & flexible to inspect data with other libraries, so we're not trying to cover that. But the clipboard icons by dataset names let you copy the path to the dataset, so you can paste it into code.

In [7]:
import matplotlib.pyplot as plt
In [8]:
plt.imshow(f['/group/integers'][0])
plt.colorbar()
Out[8]:
<matplotlib.colorbar.Colorbar at 0x7fcea6b5cd00>

Realistic example

This is the structure of our HDF5 files at European XFEL. We're often working through many layers of nested groups.

In [9]:
H5Glance('fxe_control_example.h5')
Out[9]:

Use h5glance automatically

You can set h5glance to be used to display h5py Group and File objects.

In [10]:
import h5glance
h5glance.install_ipython_h5py_display()
In [11]:
f['group']
Out[11]:
    • complex [📋]: 1024 × 1024 entries, dtype: (r: float64, i: float64)
    • custom_float [📋]: 4 × 7 entries, dtype: custom 2-byte float
    • integers [📋]: 10 × 6 × 5 entries, dtype: int64
    • strings [📋]: 4 entries, dtype: UTF-8 string

h5glance in the terminal

This was demoed separately. Features include:

  • Clear tree view
  • Formatting to quickly distinguish groups, datasets & links
  • Automatically uses a pager for long listings
  • Tab completion of paths inside your HDF5 file
  • Dataset detail view
  • Listing attributes with --attrs
In [12]:
!h5glance sample.h5
sample.h5
├group (2 attributes)
│ ├complex	[(r: float64, i: float64): 1024 × 1024]
│ ├custom_float	[custom 2-byte float: 4 × 7]
│ ├integers	[int64: 10 × 6 × 5]
│ └strings	[UTF-8 string: 4]
â””links
  ├external	-> fxe_control_example.h5//INDEX/trainId
  ├hard	= /group/integers
  â””soft	-> /group/integers