Why HDF® Technologies?

Physics, the branch of science that deals with the nature and properties of matter and energy, spans a wide variety of fields, including astronomy, astrophysics, geophysics, weather and climate, biophysics, material science, and many more. Physics was probably the first branch of science to use computers in a significant way. Today, both computational and experimental physics of every type rely completely on computers and their ability to handle massive amounts of complex data.

HDF is the option of choice in many branches of physics because it offers so many different capabilities needed to manage physics data. It can effectively handle data of virtually any size, as exemplified by the trillion particle plasma physics simulation, which needed to store HDF5 files of 40 terabytes or more, and had to be able to sustainably write data at rates exceeding 50 gigabytes per second.

Size and I/O speed are critical, but physics computations and experiments often require very complex datatypes as well. Livermore Labs’ Silo Library supports many physics applications at the lab, and includes the ability to describe, store, and efficiently access a veritable zoo of structures and datatypes. Silo supports grid lists (point) meshes, structured meshes, unstructured polyhedral meshes, AMR meshes, and many other structures. It includes datatypes such as tensors and vectors, and user-defined compound structures. It stores complex metadata such as coordinate systems, and makes heavy use of HDF5’s grouping structure to show relationships among all of these components.

Experimental and observational physics applications often have different needs in terms of volume, velocity, and complexity. Synchrotrons collect data from an array of distributed instruments at very high speeds, and need to provide immediate access to that data in a coherent way, so that scientists can steer their experiments in real time. HDF5 provides an innovative storage structure called virtual data sets that makes the distributed data sources appear as one coherent array. It provides middleware that makes it possible to view the data in real time, and it provides special data compression options that make it possible to keep up with the data rates of modern instruments. It is no surprise HDF5 has been the option of choice for much of the synchrotron community for more than 20 years.

Not everything that happens in experimental and computational physics is big and fast. A great deal of science happens on small laptops, working with tiny subsets of huge data sets, writing code in any of a number of languages, and using any of a wide variety of analysis and visualization tools. Again HDF5 answers the call by being portable across almost all operating systems; by providing language interfaces in C, FORTRAN, Java, Python, R, and others; and by being supported by many third-party packages, such as MATLAB, Mathematica, LabVIEW, and others.