Blog

H5Coro: The HDF5 Cloud-Optimized Read-Only Library Presented by JP Swinski Friday, May 14, 2021 11:00 a.m. CDT Abstract NASA’s migration of science data products and services to AWS has sparked a debate on the best way to access science data stored in the cloud.  Given that a large portion of NASA’s science data is in the HDF5 format or one of its derivatives, a growing number of efforts are looking at ways to efficiently access H5 files residing in S3.  This presentation describes one of those efforts and argues for the creation of a standardized subset of the HDF5 specification targeting cloud environments. Please register to join us....

Greetings! We have scheduled a new webinar for Friday, April 30 at 11:00 a.m. CDT. HDF5 Application Tuning: There is more than one way to skin a cat(fish), Part 2 Abstract: Before returning to application tuning (in part 3), in this second part of the series, we take a closer look at HDF5 performance variability. We highlight the main variability sources, their impact on performance, and considerations for HDF5 container design. Please register to join us live. This event will be recording and shared with the community....

On March 26, 2021, The HDF Group hosted the Hermes development team to learn more about the Hermes project. Abstract In this webinar, we will provide an update on an NSF-funded joint effort between the Illinois Institute of Technology (IIT) and The HDF Group. We will explain Hermes’ goals and what differentiates it from existing technologies. We will introduce our team members, and present Hermes’ current status including the abstractions involved, high-level API, as well as its system architecture. Different team members will present the results they’ve obtained under this project. Finally, we’ve prepared a few demonstrations, and we will show how you can get involved. Slide Deck https://youtu.be/KJdXMqfRmS4...

Suren Byna, Elena Pourmal, Lori Cooper  Overview HDF5 has been a widely used tool to simplify management and access to scientific and engineering data with ubiquitous data solutions. With rapidly growing data across all domains of science and industry, HDF5 developers have been building technologies that provide rapid, easy, and permanent access to complex data. The HDF Group, a non-profit organization, has been the driving force behind developing and maintaining the HDF5 software library for more than two decades.  HDF5 has been extremely successful with a wide range of users and an ecosystem built around the HDF5 library. With the goal of facilitating a broader discussion among HDF5 users, The HDF Group and Lawrence Berkeley National Laboratory (LBNL) teamed up to host a...

On March 26, 2021 at 11:00 CDT, we will present the webinar, Hermes - A Distributed Buffering System for Heterogeneous Storage Hierarchies. Abstract In this webinar, we will provide an update on an NSF-funded joint effort between the Illinois Institute of Technology (IIT) and The HDF Group. We will explain Hermes' goals and what differentiates it from existing technologies. We will introduce our team members, and present Hermes' current status including the abstractions involved, high-level API, as well as its system architecture. Different team members will present the results they've obtained under this project. Finally, we've prepared a few demonstrations, and we will show how you can get involved. Register...

On January 19, 2021 The HDF Group participated in the NCSA’s Webinar series, The University of Illinois New Frontiers Initiative Webinar Series with sessions presented by Gerd Heber, John Readey, and Aleksander Jelenak. HDF Technologies and Resources for Geospatial Data Abstract: The HDF Group created HDF at NCSA in 1988 to enable scientists and engineers to describe, store, and access large, complex data structures and collections. Since then we have worked with data producers, providers, and users all over the world and in every discipline to develop and evolve HDF to meet the needs of changing technologies and applications. Applications as diverse as gravity wave detection, gene sequencing, and finance use the HDF5 data model and software to acquire and share data and solve...

On February 12, 2021, we were pleased to host Lucas Villa Real of IBM Research to discuss his project HDF5-UDF, a data virtualization tool for HDF5. The tool enables users to associate logic in source code form (i.e., in user-defined functions, written in Python, C/C++, or Lua) with HDF5 datasets. Such UDFs are compiled into a binary form (which often takes no more than a few KB) and embedded into HDF5; once an application reads such a dataset, HDF5-UDF executes that binary code and generates the data on-the-fly. Lucas has just released HDF5-UDF 1.2 which offers several new features: among other benefits, it makes it possible to easily virtualize CSV files so they look like regular HDF5 datasets. Attached you'll find the slide deck...