Should HDF be HIF?
Ted Habermann, The HDF Group
I first heard of HDF during the “Data Format Wars” of the 1990’s. These “battles” centered on the selection of a format for the emerging NASA Earth Observing System archives, and there were a number of contenders. HDF won that battle in the end because of the inherent flexibility of the format and the tools for reading and writing it.
Now, twenty years later, HDF has emerged as the foundation format for an incredibly diverse and growing selection of scientific and commercial disciplines.
Is it the inherent flexibility of the format that has led to this success? Maybe, but I would pick information integration as the killer HDF feature.
I learned about the “Continuum of Understanding,” originally described by Harlan Cleveland (Information as a Resource, 1982), during the early days of the Web. The continuum has four stages: data, information, knowledge, and wisdom. Data are observations and model results that are collected from the world around us. They are generally numbers that, by themselves, are not very useful. HDF adds structure, context, and organization to the data to create information that can be shared and hopefully absorbed by others.
The knowledge stage of the continuum is where most human discourse happens. Individuals create knowledge as they consume information from multiple sources and merge it with their experience. People share the knowledge that they have gained and present their points of view (context). This discourse hopefully leads to “wisdom,” the current state of the communities’ understanding of the object of study.
HDF does the data part of this continuum very well, but it is the information and knowledge segments of the continuum where it really excels.
HDF supports evolving data to information by providing incredibly rich mechanisms for attaching structured documentation (metadata) to any object in HDF files. In some cases, this documentation can be a simple set of flat attributes. The Attribute Convention for Data Discovery (ACDD) developed in the netCDF/HDF community is a great example of how even flat global attributes can facilitate data description and discovery. The simple set of global attributes used in this convention has been very successful in adding a standard discovery layer to THREDDS servers all over the world and in integrating metadata from HDF files written using the netCDF interface with international metadata standards.
As products developed in HDF become more complex, including:
- data from multiple observation streams and instruments
- auxiliary data for calibration or geolocation, and
- metadata for use and understanding,
the need for metadata for specific datasets or groups of datasets increases. Of course, the flexibility of HDF again rises to the task here, supporting structured metadata for any group or dataset in the file. Recently developed prototypes for data from the upcoming NASA Ice Cloud and Land Elevation Satellite mission (ICESat-2) use over 600 groups to organize data and associated metadata into natural units that are easy to use and understand.
The metadata capabilities built into HDF also support the evolution of information into community knowledge as communities agree on structures and metadata for sharing data stored in HDF. There are too many of these community conventions to count, and they span all scales and disciplines where HDF is used. One common application is sharing data collected by large and expensive facilities (dust accelerators, experimental plasma machines, or satellites) across communities. Being able to use and understand these data without first decoding some arcane format is the foundation for community information-sharing and knowledge-building.
The data stored in HDF files are compressed blobs of bits and bytes generated by instruments, programs, and models. These put the data in HDF. The associated metadata make these data useable and understandable across communities and facilitate the creation of knowledge and, eventually, community wisdom. They put the information in “Hierarchical Information Format.”