HDF5 Tutorial at the 2020 ECP Annual Meeting - The HDF Group - ensuring long-term access and usability of HDF data and supporting users of HDF technologies

On February 6, 2020, the members of the ECP ExaIO project, Elena Pourmal and Scot Breitenfeld (The HDF Group), Quincey Koziol (NERSC), and Suren Byna (LBNL), presented HDF5 Tutorial at the ECP Annual Meeting.

The goal of this Tutorial was to introduce HDF5 to the new users, discuss the best practices when using HDF5, and update the current HDF5 users on the recent achievements and new additions to HDF5 software the ExaIO project contributed.

The Tutorial consisted of three parts. Part I introduced the HDF5 data model, and APIs for organizing data and performing I/O, and the best practices when using HDF5. Part II gave an overview of parallel file systems and showed how to use HDF5 in a parallel environment to perform I/O on a shareable file or multiple HDF5 files. Part III used examples from well-known codes and use cases from experimental sciences to demonstrate the tuning techniques such as collective metadata I/O, data aggregation, parallel compression and new HDF5 features that help to utilize HPC storage beyond current files systems (Data Elevator and UnifyFS).

We had about 25 attendees; many good questions were asked during 3 hours of Tutorial. We tried to capture those during the session and we are happy to share our Tutorial (PDF of slide deck) and Q and A materials (below) with the wider community now. Please don’t hesitate to contact help -at- hdfgroup.org if you have any questions about HDF5. We are always happy to help!

Elena Pourmal (epourmal -at- hdfgroup.org)
Scot Breitenfeld (brtnfld-at- hdfgroup.org)
Quincey Koziol (koziol -at- lbl.gov)
Suren Byna (sbyna -at- lbl.gov)

Q&A Session

https://tinyurl.com/uoxkwaq (Google document of live Q&A)

Hi, will the slides be available?
- They are available on the ECP Confluence space: https://confluence.exascaleproject.org/display/2020ECPAM/Sessions
- If you do not have access to the Confluence space, send us an email, or try this Dropbox link (which might disappear eventually): https://www.dropbox.com/s/ea5fcfs2wukkuvt/20200206_ECPTutorial-final.pptx?dl=0
  - Thank you. I got the slides!
Do you support any lossy compression methods?
- Not currently, although several 3rd-party I/O filters have been written for lossy compression methods: https://support.hdfgroup.org/services/filters.html It’s a little tricky how to express lossy compression in HDF5, since the API allows for dataset elements to be overwritten, compounding loss of information when multiple overwrites occur. Certainly not a blocker, and many applications don’t overwrite data, but something to be aware of.
- Thanks. That make sense.
Does HDF5 support automatic conversion of language Class types to storage types
- No, a C++ class must be described as a “compound” datatype in the HDF5 API, with the fields in the class being added as fields in the compound datatype. But, the H5CPP package has spiffy features that can automate this quite a bit: http://h5cpp.org Once the mapping of the class information is done, HDF5 will automatically access the fields. Here’s an example in C: https://bitbucket.hdfgroup.org/projects/HDFFV/repos/hdf5-examples/browse/1_10/C/H5T/h5ex_t_cmpd.c
- For Python, I would suggest h5py (https://www.h5py.org), which isn’t quite as automated as h5cpp, but fits into the Python ecosystem nicely, with numpy, etc. support.
- For modern FORTRAN: No, at this point we cannot do it in FORTRAN, but we can definitely facilitate the process.
- The “text to datatype” high-level API routine may also be helpful: https://portal.hdfgroup.org/display/HDF5/H5LT_TEXT_TO_DTYPE
- Thank you
Will there be any slides discussing debugging of parallel io — how to debug a parallel hang or tracing or ??
- Yes, Scot will talk about it
HDF5 VOL connectors repo
- https://bitbucket.hdfgroup.org/projects/HDF5VOL
Tracing with Darshan
- Darshan eXtended Trace (DXT)
  - https://www.mcs.anl.gov/research/projects/darshan/docs/darshan3-runtime.html#_using_the_darshan_extended_tracing_dxt_module
- darshan-dxt-parser output
  - https://www.mcs.anl.gov/research/projects/darshan/docs/darshan3-util.html#_darshan_dxt_parser
How to figure out which HDF5 metadata call is not being called by all ranks:
- Set the “H5_COLL_API_SANITY_CHECK” environment variable to “1”: “setenv H5_COLL_API_SANITY_CHECK=1” and the HDF5 library will perform an MPI_Barrier() call inside each metadata operation that modifies the HDF5 namespace. It will be slow, but much easier to debug and see which rank is hanging in the MPI barrier.
Independent I/O for attributes: yay! We write ~3-10 attributes per dataset and group
Are there any initiatives to unify the description of bool and complex types in the community?
- h5py’s conventions are very popular (for labels on enums & compounds) but e.g. pytables and some math libs do not necessarily follow them. I think if you just recommend the h5py way with examples on the homepage, a lot of projects would adopt and unify such trivial yet portability-hindering details for good.

Q&A Session

Leave a Comment Cancel Reply