This post is an update of one originally published in September 2024. We’ve updated it with some of the features available in the next HDF5 library release 2.0: Introduction to CMake presets and how they can be used when building the library, new ROS3 backend, and building the library with a faster DEFLATE compression filter. |
by Aleksandar Jelenak, The HDF Group
Virtual environments are very popular method for ordinary computer users to install software into an isolated space without assistance from their system administrators. The conda package manager is one of the most widely used tools for managing virtual environments. There is also a newer implementation with the same functionality, called mamba, but we will refer only to conda here for simplicity. Use mamba instead if you want by simply replacing conda with mamba.
With the release of HDF5 version 2.0 approaching, it may be of interest to HDF5 users how to build HDF5 library and h5py in a conda virtual environment. Topics also covered:
- zlib-ng library as the DEFLATE compression filter.
- Library’s Read-Only S3 (ROS3) driver and its new backend, Amazon C S3 library (aws-c-s3).
- CMake presets for library’s build configuration.
Let’s explain each one a little.
zlib-ng as HDF5 DEFLATE Compression Filter
DEFLATE is one of the two data compression methods that have been available in the HDF5 library for a very long time. The DEFLATE implementation used in the HDF5 library comes from the zlib library. Because of its open source license and widespread availability, the vast majority of current HDF5 data are compressed with this method.
Another open-source implementation called zlib-ng has benchmarks indicating about 2x performance improvement compared to the zlib library. It achieves better performance by leveraging advanced CPU instructions and a modern C-language drop-in replacement of the zlib’s API. HDF5 users should immediately benefit from having zlib-ng handle DEFLATE compression and decompression operations.
ROS3 Driver for HDF5 Files in S3
HDF5 library has Read-Only S3 (ROS3) driver for reading data from HDF5 files in S3-compatible object stores. For the library’s next version 2.0, the driver’s custom code for S3 communication based on cURL was replaced with Amazon’s C S3 library (aws-c-s3). This library brings a lot of functionality that was missing from the current ROS3 driver, namely, advanced handling of AWS configuration and credential settings, and smart behaviour for failed but non-fatal S3 requests.
CMake Presets
CMake presets are collections of CMake and environment variable settings that specify CMake build, test, and packaging configurations. These presets must be stored in JSON files following a specific schema. Two such files are special when placed in the CMake project root folder: CMakePresets.json
and CMakeUserPresets.json
. The first of these files is for project-wide presets, while the latter is for developer’s personal presets and is optional.
These presets JSON files are a very convenient and efficient method to organize and share CMake configurations. If CMakePresets.json
and CMakeUserPresets.json
are both present then CMakeUserPresets.json
implicitly includes all the presets from CMakePresets.json
. Below is a simple CMakeUserPresets.json
used to build the library for this blog:
{
"version": 6,
"configurePresets": [
{
"name": "my-macOS",
"inherits": [
"ci-Clang",
"ci-Release",
"ci-StdCompression",
"ci-S3"
],
"cacheVariables": {
"HDF5_BUILD_DOC": {"type": "STRING", "value": "OFF"},
"HDF5_BUILD_EXAMPLES": {"type": "STRING", "value": "OFF"},
"HDF5_USE_ZLIB_NG": {"type": "STRING", "value": "ON"}
}
}
],
"buildPresets": [],
"testPresets": [],
"packagePresets": [],
"workflowPresets": []
}
The content defines a new configuration preset named my-macOS
. Presets are referred in cmake
commands by their names. The inherits
key above lists the presets from the CMakePresets.json
file in the HDF5 library source code, whose settings are combined into this user-specific preset. The inherited presets configure specific build options: clang
compiler, release binary artifacts, standard compression filters (we mentioned DEFLATE), and ROS3 driver. Don’t mind the ci-
prefix in their names, they are perfectly useful presets to inherit from.
The inherited presets also have some settings not needed here. Their values are changed under the cacheVariables
key: don’t build documentation and examples, and use zlib-ng library for the DEFLATE compression. Above we used the extended form when setting the values; a simple, e.g., "HDF5_USE_ZLIB_NG": "ON"
works, too.
Build and Install HDF5 Library
First create a conda virtual environment with the required software packages for building HDF5 library. The environment’s software is installed only from the conda-forge package repository, which is a community GitHub organization that maintains a large number of software packages. Every command will explicitly state the conda-forge repository as the source, but it is easier and better to do this in the conda/mamba configuration settings.
The conda-forge repository has HDF5 library as its hdf5
package and we certainly recommend installing it if you just need the library at the version and build configurations provided. But if you need a version or feature otherwise not available from a conda package repository, this blog is for you.
We are not going to cover here all the different ways to get the HDF5 library’s source code. For this example, we will download the latest HDF5 library code from its develop branch on GitHub as a ZIP archive. This is how the folders look like after unzipping it:
.
└── hdf5-develop
├── HDF5Examples
├── bin
├── c++
├── config
├── doc
├── doxygen
├── fortran
├── hl
├── java
├── m4
├── release_docs
├── src
├── test
├── testpar
├── tools
└── utils
Create a virtual environment named build-example
with the required packages and activate it:
~ $ conda create -n build-example -c conda-forge \
c-compiler \
ninja \
cmake \
aws-c-s3 \
python=3.13
~ $ conda activate build-example
Somewhere in the terminal prompt should be (build-example)
as an indication that the correct virtual environment is active.
Move to the hdf5-develop
folder and create a CMakeUserPresets.json
file with your favorite text editor. Paste the custom CMake preset, an example is shown above, and save the file. Verify that cmake
finds it (called my-macOS
here):
(build-example) hdf5-develop $ cmake --list-presets
Available configure presets:
"my-macOS"
"ci-StdShar-Clang"
"ci-StdShar-macos-Clang"
"ci-StdShar-macos-GNUC"
"ci-StdShar-Intel"
Execute the following command to configure the build phase:
(build-example) hdf5-develop $ cmake \
--preset my-macOS \
-S . \
-B ../build \
-G Ninja \
-DCMAKE_INSTALL_PREFIX=${CONDA_PREFIX}
Note how much simpler the cmake command is now after using a preset for almost all of the build settings. If the command finishes succesfully there should be a new folder named build
(at the same level as hdf5-develop
) with all the build artifacts. Setting the $CONDA_PREFIX
environment variable makes the library build “aware” of the active virtual environment by pointing to its system-like folders:
$CONDA_PREFIX
├── bin
├── conda-meta
├── doc
├── etc
├── include
├── lib
├── libexec
├── man
├── sbin
├── share
└── ssl
Build and then run the tests to ensure the library is working as expected:
(build-example) hdf5-develop $ cmake --build ../build -j 4
(build-example) hdf5-develop $ ctest --test-dir ../build -j 4
Depending on where the library source came from there could be some failed tests. For example, the development branch code may not always pass all of its tests. However, all tests should pass for an official release code. Use the --stop-on-failure
option to stop running tests on the first failure.
Final step, install the library and its command-line tools:
(build-example) hdf5-develop $ cmake --install ../build
If interested, check where the install artifacts went by running the commands below (output not included):
(build-example) ~ $ ls $CONDA_PREFIX/bin/h5*
(build-example) ~ $ ls $CONDA_PREFIX/lib/*hdf5*
(build-example) ~ $ ls $CONDA_PREFIX/include/H5*
Deactivate and activate the virtual environment so the newly installed bin/h5*
commands are picked up:
(build-example) ~ $ conda deactivate && conda activate build-example
and then check the version of the h5dump
command:
(build-example) ~ $ h5dump --version
h5dump: Version 2.0.0
Final check. If the library has a functioning ROS3 driver the command below should show h5dump
output for this simple HDF5 file:
(build-example) ~ $ AWS_REGION=us-east-1 h5dump --filedriver=ros3 s3://hdfgroup/data/hdf5demo/tall.h5 | head -20
HDF5 "s3://hdfgroup/data/hdf5demo/tall.h5" {
GROUP "/" {
ATTRIBUTE "attr1" {
DATATYPE H5T_STD_I8BE
DATASPACE SIMPLE { ( 10 ) / ( 10 ) }
DATA {
(0): 97, 98, 99, 100, 101, 102, 103, 104, 105, 0
}
}
ATTRIBUTE "attr2" {
DATATYPE H5T_STD_I32BE
DATASPACE SIMPLE { ( 2, 2 ) / ( 2, 2 ) }
DATA {
(0,0): 0, 1,
(1,0): 2, 3
}
}
GROUP "g1" {
GROUP "g1.1" {
DATASET "dset1.1.1" {
Everything looks good so we can proceed with building h5py.
Install h5py with Custom HDF5 Library
The virtual environment where the custom HDF5 library was installed must be active. Let’s check if the pip
tool is available and its version:
(build-example) ~ $ python -m pip --version
pip 25.1.1 (python 3.13)
This is a safer way to run pip
because it ensures the virtual environment’s pip
will be used. H5py will be built with the latest versions of NumPy, Cython, and setuptools so add them to the virtual environment:
(build-example) ~ $ conda install numpy cython setuptools -c conda-forge
Build and install h5py:
(build-example) ~ $ HDF5_DIR=$CONDA_PREFIX python -m pip install \
--no-binary=h5py \
--no-build-isolation \
h5py
Using cached h5py-3.14.0.tar.gz (424 kB)
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: numpy>=1.19.3 in ./conda/envs/build-example/lib/python3.13/site-packages (from h5py) (2.3.1)
Building wheels for collected packages: h5py
Building wheel for h5py (pyproject.toml) ... done
Created wheel for h5py: filename=h5py-3.14.0-cp313-cp313-macosx_11_0_arm64.whl size=1538456 sha256=f6740a6fadc8267efef417648fa373ad449e6e8deb875321bb128a895619cc80
Successfully built h5py
Installing collected packages: h5py
Successfully installed h5py-3.14.0
The options to pip
ensure building h5py from source code and using the Python packages available in the virtual environment. The latter avoids pip
building h5py with different versions of the required Python packages. The HDF5_DIR
environment variable points the h5py build process to the desired HDF5 library installation. Note this only applies during the build process. If there is another HDF5 library in a system-wide location it may be found prior to this one and cause an error when using h5py. How to avoid this is beyond the scope here. After all, this is one of the reasons to use virtual environments.
The first and most important test is if h5py can be successfully imported. We are also going to compare the HDF5 library versions h5py was built with and is using at runtime:
(build-example) ~ $ python -c "import h5py; print(h5py.version.hdf5_built_version_tuple == h5py.h5.get_libversion())"
True
If the above command reported True
– congratulations! You have successfully built and installed both HDF5 library and h5py in a conda virtual environment. Remove the build
folder, and don’t forget to deactivate this virtual environment when done or just close the terminal session.
Extra: Is zlib_ng really used as DEFLATE filter?
Since this blog highlighted how to select zlib-ng library as HDF5’s DEFLATE filter, a careful reader may ask why zlib-ng has not been included explicitly in the virtual environment as a build and runtime dependency of the HDF5 library. It is because the HDF5 library’s build process downloaded source code of a specific zlib-ng release, built it, and then included all the required zlib-ng’s functions for DEFLATE filter functionality into the library’s own dynamically shared binary artifact. Below is the proof that indeed those functions are present:
(build-example) ~ $ nm $CONDA_PREFIX/lib/libhdf5.dylib | grep zng | head -20
000000000035e5f0 t _zng_bi_reverse
0000000000354e00 T _zng_compress
0000000000354ce8 T _zng_compress2
0000000000354f18 T _zng_compressBound
0000000000354f5c T _zng_crc32
000000000035441c t _zng_crc32_braid
0000000000354f44 T _zng_crc32_z
0000000000355f40 T _zng_deflate
0000000000356c48 T _zng_deflateBound
0000000000356e2c T _zng_deflateCopy
00000000003552d0 T _zng_deflateEnd
00000000003559f8 T _zng_deflateGetDictionary
0000000000357364 T _zng_deflateGetParams
00000000003554e0 T _zng_deflateInit
0000000000355078 T _zng_deflateInit2
0000000000355528 T _zng_deflateInit2_
00000000003554f4 T _zng_deflateInit_
0000000000355d68 T _zng_deflateParams
0000000000355bf8 T _zng_deflatePending
0000000000355c60 T _zng_deflatePrime