HDF5 European Workshop 2019

Learn more about HDF User Group Meetings.

Co-organized with ESRF, this free event was held on September 17-18, 2019. This event covered the latest HDF5 developments presented by The HDF Group, then delved into HDF5 use cases from science and industry, wrapping up with a presentation on HDF5 Applications and Tools. This free event was graciously sponsored by OpenIO and Omnibond.

The following is an archive of the recorded presentations of this event. In a few cases, the videos linked below are actually only audio of the presentations. We are working with the presenters to get their slides and will update the links here with new videos. We are also still working on the recording of the HDF Round Table and will post that here and announce it when completed.

Tuesday, September 17, 2019

(Day 1 Video Playlist)

HDF5 Latest Developments

HDF Group Past, Present + Future – Dr. Michael Folk (The HDF Group)
VideoPresentation Deck
The HDF Group has been developing and supporting HDF technologies since 1988. Our mission has always been to support the HDF user community and to insure that the HDF ecosystem can evolve to meet the ever-changing world of data. In this talk we will give a brief history of HDF and the HDF Group; explain our mission and commitment to the data community; describe our culture and business model and challenges in sustaining our technology and services; talk about future directions for the HDF Group and HDF R&D; and speak about the critical roll partnerships and community play in ensuring the long-term success of HDF.

HDF5 Roadmap for 2019-2020 and beyond – Elena Pourmal (The HDF Group)
Video • Presentation Deck
For more than two decades The HDF Group developers have been working with researchers all over the globe helping to capture, store and analyze experimental data in HDF5. In the past several years, the amount of experimental and observational data stored in HDF5 and the rate at which this data is collected have created new challenges for the scientists and triggered requests for new features in HDF5, for example, accessing data on non-POSIX storage, which are now under development.

In this talk we will give an overview of the features available in the HDF5 1.10 releases, new features coming in the HDF5 1.12 releases including access to HDF5 data in the Cloud and Object Store. We will discuss our vision of HDF5 software evolution and how community can contribute. We will conclude the talk with a demo of a prototype implementation of single writer/multiple reader (SWMR) feature that allows file modifications during writing and offers guarantees on maximum time when new data is available to a reader, and a demo of HDF5 data streaming capabilities. We will use this presentation to get feedback from the HDF5 users’ community on the HDF5 roadmap and community contributions.

HDF Server – John Readey (The HDF Group)
VideoPresentation Deck
HSDS (Highly Scalable Data Server) is a web service for HDF data. By enabling HDF functionality over a REST API, HSDS can provide a solution for many applications which would be difficult to implement with just the HDF library.

In this session we will cover: 1. Motivation for a service-based implementation of HDF 2. HSDS Features 3. Architectural Design 4. Storage Schema (and comparison with ZARR) 5. Performance features 6. Achieving Scalability 7. Client Libraries 8. Parallel computing with HSDS 9. Kita Server – commercialized version of HSDS 10. Cloud vs on-prem installations 11. Kita Lab – customized Jupyter Lab with Kita Server support 12. Demo

OpenIO and HDF Server – Jean-Francois Smigielski (OpenIO), John Readey (The HDF Group)
VideoPresentation Deck
OpenIO SDS (Software Designed Storage) is an open source object storage platform that can provide a performance and cost-effective means of storing large data collections. As an alternative to parallel file systems or NAS storage solutions, OpenIO SDS provides features such as:

  • Built in redundancy
  • Scalability to PB sized collections
  • Ease of expanding storage
  • Use of any heterogeneous commodity hardware
  • Self-management via Conscience Technology

For HDF5 storage collections, OpenIO now natively supports the HDF REST API. This means that existing HDF applications can now read and write data directly to OpenIO SDS. As more nodes are added to the OpenIO SDS cluster, throughput and performance will increase as the workload is atomically scaled across the entire cluster. This architecture has the potential to dramatically improve application performance compared with alternative solutions. In this session we will review of SDS technologies and architecture followed by a discussion of how HDF is supported in SDS.

HDF5 Science Use Cases

Fast feedback in x-ray imaging experiments for quick decision making – Dr. Valerio Mariani (CFEL/DESY)
Video • Presentation Deck
The Coherent Imaging Division of the Center for Free Electon Laser Science develops innovative methods for imaging using X-ray Free Electron Laser (XFEL) and synchrotron sources, with an emphasis on bioparticles and macromolecules. The HDF5 file format has in recent years become the de-facto standard data storage format in this field. Modern x-ray detectors like the AGIPD, EIGER and JUNGFRAU, and new facilities such as the European XFEL and the upcoming LCLS-II, generate a great number of large data files even for single experiments. Algorithms and software tools are being developed to process this large amount of data within the time-frame of the experiment, with the goal of providing useful feedback that can steer the experiment itself. Data compression and parallelization are vital for dealing with these extreme rates of data collection. This presentation will focus on the achievements in this effort but also illustrate the remaining unsolved challenges.

HDF5 and NeXus – Dr. Benjamin Watts (Paul Scherrer Institute)
Video • Presentation Deck
NeXus is a data model that promotes standardisation of the organisation of scientific information. It is now commonly used in the X-ray, neutron and muon communitities, and is readily extendable to more diverse applications after two decades of maturating. NeXus currently recommends HDF5 as the container format for NeXus data and as such, the capabilities and limitations of HDF5 have influence some of the decisions along the way. This presentation will give an overview of NeXus, highlighting some changes from recent years as well as discuss how NeXus and HDF5 are used together in a few different use-cases.

HDF5 at ESRF – Dr. Vicente Armando Sole Jover
VideoPresentation Deck
While support of the HDF5 format in ESRF applications dates back to late 2009, ESRF is basically the last synchrotron radiation facility adopting HDF5 as main data format for acquisition. This talk presents the ideas behind the implementation of HDF5 and NeXus at the ESRF in the context of data acquisition, data analysis and data policy. The status of the implementation will also be presented.

HDF5 technology and NeXus data format usage at Diamond Light Source – Dr. Peter Chang (Diamond Light Source Ltd)
Video • Presentation Deck
Diamond Light Source uses HDF5 in its data acquisition to write NeXus files and it is an essential component for our data analysis. We have (co-)sponsored some advanced features that debuted in version 1.10 including SWMR and VDS. These features are used across a number of experimental stations in data acquisition systems including GDA, EPICS (Area Detector) and Odin (OdinData). Both online and offline data analysis programs/pipelines including DAWN, DIALS, XDS, Savu and PtyPy rely on easy and timely access to data and metadata (as specified by the NeXus standard). We give details on how we use this technology and what advantages are gained.

Nexus/HDF5 at SOLEIL : usage and history – Stéphane Poirier (Synchrotron SOLEIL)
VideoPresentation Deck
Nexus/HDF5 data format is the “standard” on all our beamlines since the beginning of operations at SOLEIL. Our scientists were reluctant to adopt at first, but are now recognizing that it allows them to retrieve old data with their context. This was recognized at a workshop given during a SOLEIL users’ meeting in 2015. This presentation will describe the different steps that have been worked on to have the Nexus/HDF5 up and running. After a short introduction explaining our strategy for installation of Nexus/HDF5 as a standard, we will describe the tooling we have developed to link our tango control system to the data storage service, which utilities we provided to our users to interface the system to their client environments. We will then go through the difficulties we have had to tackle. One of these difficulties is that Nexus/HDF5 is not really defining per se a standard for data organization in the file. This issue was addressed at SOLEIL by developing a Common Data Access Model layer which will be also introduced. The presentation will be ended by introducing the future perspectives of further developments on the topic.

HDF5 in geomagnetic data assimilation and visualisation – Dr. Loïc Huder (ISTerre, CNRS)
VideoPresentation Deck
The  Earth’s magnetic field is generated by motions of the Earth’s liquid core, a process called the geodynamo. The geodynamo problem is so challenging that even recent numerical simulations fail to capture the features observed in the geomagnetic field. Data assimilation algorithms appear to be a viable way to tackle the geodynamo problem. Such algorithms incorporate information from both numerical simulations and real measurements of the geomagnetic field to reduce computation times and generate results that are closer to the reality.

Our group develops the pygeodyn Python package that performs geomagnetic data assimilation. It uses a reduced numerical model and up-to-date magnetic measurements to model and forecast the flow of the Earth’s core.

We present here why HDF5 (through its Python interface h5py) is our output format of choice. It allows us to store computation results in an ordely fashion, and configuration parameters as attributes, making it an unvaluable tool for reproducible research. We will also show how HDF5 files are used as an efficient interface with the webgeodyn package, also developed in our group, for geomagnetic data visualisation.

NeXus framework at DESY – Dr. Jan Kotanski (DESY)
VideoPresentation Deck
At DESY metadata and data are stored in HDF5 files using the NeXus format. Our NeXus framework is based on c++ libraries (h5cpp and PNI) as well as python bindings (h5py and h5cpp). It gains a modular structure with a help of Tango Controls software. The NeXus configuration is stored in MySQL DB in parts called components, i.e. XML NXDL-like-strings. They are created by nxstools scripts from general or beamline-specific templates separately for each beamline. Components are selected by users before a scan and merged to one XML configuration string, which is used by Sardana NeXus recorder to a create scan master file. NeXus files with fast detector data are created separately and linked to the master file at the end of the scan. The presentation gives the details of our NeXus framework and shows gained advantages.

HDF5 at ITER – Dr. Lana Abadie (ITER)
VideoPresentation Deck
Present the current usage of HDF5 at ITER for data archiving.

HPC in the cloud – how it can help with HPC library development Mr. Arthur Petitpierre (Amazon Web Services)
VideoPresentation Deck
– Presentation of past and future collaborations between HDF Group and AWS
– Showing the advantages of infrastructure as code applied to HPC workloads
– Presentation of EFA (Elastic Fabric Adapter), AWS low-latency interconnect and how it can benefit to HPC workloads in general and HDF5 in particular
– How to experiment with HDF5 on AWS

Wednesday, September 18, 2019

(Day 2 Video Playlist)

Industrial Use Cases

Real-world HDF5: applications in finance – Dr. Ivan Smirnov
VideoPresentation Deck
This talk is a brief outline of real-world daily use of HDF5 in finance, with discussion of both the strong and weak points of the library and its Python/C++ wrappers, along with a few suggestions on what could be enhanced to improve general usability.

Blosc2 and Caterva: the nascent libraries after the Blosc compressor – Francesc Alted
VideoPresentation Deck
C-Blosc2 is the new major version of C-Blosc, with a revamped API and support for new compressors and new filters (data transformations), including filter pipelining, that is, the capability to apply different filters during the same compression pipeline, allowing for more adaptability to the data to be compressed. Dictionaries are also introduced, allowing better handling of redundancies among independent blocks and generally increasing compression ratio and performance. Furthermore, the new data containers are available in various formats, including sparse (mainly for in-memory use) and serialized (for disk and network transmission).

Caterva is a C library on top of C-Blosc2 that implements a simple multidimensional container for compressed binary data. It adds the capability to store, extract, and transform data in these containers, either in-memory or on-disk.

During my presentation, I’ll describe these new libraries, what they are bringing to the world of data storage, and specially, how they are different from existing solutions.

The Allotrope Framework – Mr. Benjamin Woolford-Lim
VideoPresentation Deck
This talk describes the Allotrope Framework, a new data standard for data in life sciences. Built on HDF5 technology, the Allotrope Framework is made to store experimental data and metadata in a high-performance format suitable for a variety of scientific techniques. The Framework leverages Semantic Web approaches, audit trailing, and checksums to enable coherent, consistent, and interoperable data storage, with data integrity baked in from the start of the capture process.

HDF5 Round Table – Elena Pourmal, Dr. Micheal Folk
Video coming soon!

The HDF Group members will hold a round table session to discuss the issues brought by the HDF5 Workshop participants. The HDF5 Workshop is a great venue to share any items on your HDF5 wish list, to ask technical questions, and to have a discussion with fellow participants. 

HDF5 Applications and Tools

Investigation of hardware compression on IBM Power9 – Dr. Jerome Kieffer, Antoine Roux, Pierre Paleo, Benoit Rousselle
VideoPresentation Deck
This contribution presents how the gzip compression engine integrated into the IBM Power9 processor can be used with the HDF5 storage to provide faster saving of actual diffraction data compared to the software compression.

The default compression provided as part of HDF5 will be compared with meta-compressors like Blosc. A few pre-compression filters will be presented and their performances compared with HDF5.

h5py latest developments – Dr. Thomas Caswell
VideoPresentation Deck

HDF5 Web Viewer – Dr. Jason Brudvik
VideoPresentation Deck
At the MAX IV Laboratory we have developed a web based viewer for HDF5 data files. We designed it to be simple and fast, with the aim of allowing users to quickly and conveniently browse their HDF5 data files and visualize their data sets. I will discuss the structure and features of the application and give a quick demo.

HDF5 + silx – Dr. Thomas Vincent
Video • Presentation Deck
silx is a Python library initiated by ESRF to support the development of data assessment, reduction and analysis applications. It provides support for different file formats, data reduction routines including GPU-based ones, [Py]Qt widgets to browse and visualize data.

One of its purpose it to smooth the transition from the previous data acquisition file formats used at ESRF (SPEC + EDF) to HDF5/Nexus.

This talk will present the silx library with a focus on its file format conversion tool and “silx view”, the Nexus-aware HDF5 data viewer it provides.

XMP metadata for HDF5 – Dr. Benjamin Watts
VideoPresentation Deck
Adobe’s Extensible Metadata Platform (XMP) is an open standard (ISO 16684-1) for attaching metadata to files. It is used by such ubiquitous formats as JPEG, MP3 and PDF, and most computer operating systems include components for reading and displaying this metadata (including thumbnail images) in their file browsers. This functionality is pretty much taken for granted nowadays for image files and many kinds of scientific data would benefit greatly, especially in use-cases facing a large number of separate files. Perhaps it is time for HDF5 to join the XMP party? This presentation will showcase an implementation of XMP for HDF5 files together with thumbnail plugins for Windows, Linux and MacOS.

HDF5 for Rust – Dr. Ivan Smirnov
VideoPresentation Deck
Rust is a modern system programming language focused both on memory safety and high performance.

hdf5-rust is an open source project aiming at providing threadsafe and memory-safe high-level Rust API for most of the HDF5 functionality alongside the low-level bindings.

In this talk we will discuss, among other things, how particular HDF5 concepts can be reflected in a safe and efficient manner using Rust’s native language features — such as using algebraic types for error management or using compile-time procedural macros for automatic datatype generation.

No Comments

Leave a Comment