Large, rich and complex collections of HDF data can be filtered and viewed with the help of OPeNDAP. HDF data can be provided in manageable servings, on demand in real time, inexpensively, even on the user's desktop or mobile device....
When storing data, the rich, portable metadata capabilities, including directed graph structures (e.g., hierarchies), complex attributes, and inter-object references make HDF5 a superior choice for maintaining the bond between data and metadata at the lowest level. Community involvement is an essential part of the HDF Group’s mission: It is vital to sustaining the business and is our brain trust when making decisions about changes to HDF5, setting priorities, and adding new features. ...
Elena Pourmal, The HDF Group
What happened to my compression?
One of the most powerful features of HDF5 is the ability to compress or otherwise modify, or “filter,” your data during I/O.
By far, the most common user-defined filters are ones that perform data compression. As you know, there are many compression options.
There are filters provided by the HDF5 library (“predefined filters,”) which include several types of filters for data compression, data shuffling and checksum.
Users can implement their own “user-defined filters” and employ them with the HDF5 library.
[caption id="attachment_10741" align="alignright" width="300"] Cars in a 1973 Philadelphia junkyard – image from National Archives and Records Administration[/caption]
While the programming model and usage of the compression filters is straightforward, it is possible for...
...we focus on how far we can push our personal computing devices with Spark. It consists of 7,850 HDF-EOS5 files covering 27 years and totals about 120 GB. We use a driver script, which reads a dataset of interest from each file in the collection, computes per-file quantities of interest, and gathers them in a CSV file for visualization. The processing time on our reference tablet machine for 3.5 years of data using 4 logical processors was about 10 seconds....
What costs applications a lot of time and resources rather than doing actual computation? Slow I/O. It is well known that I/O subsystems are very slow compared to other parts of a computing system. Applications use I/O to store simulation output for future use by analysis applications, to checkpoint application memory to guard against system failure, to exercise out-of-core techniques for data that does not fit in a processor’s memory, and so on. I/O middleware libraries, such as HDF5, provide application users with a rich interface for I/O access to organize their data and store it efficiently. A lot of effort is invested by such I/O libraries to reduce or completely hide the cost of I/O from applications.
Parallel I/O is one technique used to access data on disk simultaneously from different application processes to maximize bandwidth and speed things up. There are several ways to do parallel I/O, and I will highlight the most popular methods that are in use today.
Blue Waters supercomputer at the National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign campus. Blue Waters is supported by the National Science Foundation and the University of Illinois.
First, to leverage parallel I/O, it is very important that you have a parallel file system;
HDF Group has just announced “HDF Server” - a freely available service that enables remote access to HDF5 content using a RESTful API. In our scenario, using HDF Server, we upload our Monopoly simulation results to the server and then interested parties can make requests for any desired content to the server - no file size issues, no downloading entire files...
Fifteen years ago, NASA selected HDF as the format for the data products produced by NASA Satellites for the NASA Earth Observing System (EOS).
The HDF Earth Science Program is well aware of this important legacy. We focus on continuing support of U.S. environmental satellite programs (NASA Earth Observing Systemand Joint Polar Satellite System, JPSS), on-going quality assurance of the HDF libraries and helping data users access and understand products written in HDF. The HDF-EOS Information Center(#hdfeos) includes code examples in MATLAB, IDL, NCL, and Python, many driven by user questions. The site also provides information on other HDF tools.
The Moderate Resolution Imaging Spectro-radiometer, MODIS, can see the Earth in true color as it appears from the satellite Terra. It also measures an unprecedented number of parameters related to global change including ocean plant life, cloud properties, atmospheric particulates (aerosols) and land surface change. Image courtesy NASA and the MODIS instrument team.
NASA’s decision ensured a role for HDF in Earth Science and set an important precedent. HDF developers, along with the U.S. and other Earth Observing nations, developed a clear distinction between Earth Science Data Objects (grids, swaths, profiles…); the metadata required to describe them; and the HDF objects (datasets, groups, attributes, etc.) that make them up.
The critical realization was that communities like EOS needed conventions for describing Earth Science objects to enable using and sharing those objects. These conventions, termed HDF-EOS, have been used successfully in hundreds of NASA products that can be easily shared among multiple users using standard tools.
Many other Earth Science communities have used the powerful combinationof conventions and HDF.