Damaris is a middleware that enriches existing HPC data format libraries (e.g. HDF5) with data aggregation and asynchronous data management capabilities. At the same time, it can be employed for in situ analysis and visualization purposes. ...
Mark Miller, Lawrence Livermore National Laboratory, Guest Blogger
The HDF5 library has supported the I/O requirements of HPC codes at Lawrence Livermore National Labs (LLNL) since the late 90’s. In particular, HDF5 used in the Multiple Independent File (MIF) parallel I/O paradigm has supported LLNL code’s scalable I/O requirements and has recently been gainfully used at scales as large as 1,000,000 parallel tasks.
What is the MIF Parallel I/O Paradigm?
In the MIF paradigm, a computational object (an array, a mesh, etc.) is decomposed into pieces and distributed, perhaps unevenly, over parallel tasks. For I/O, the tasks are organized into groups and each group writes one file using round-robin exclusive access for the tasks in the group. Writes within groups are serialized but...
UPDATE January 19, 2016: The HDF5-1.10.0-alpha1 release is now available, adding Collective Metadata I/O to these features:
– Concurrent Access to an HDF5 File: Single Writer / Multiple Reader (SWMR)
– Virtual Dataset (VDS)
– Scalable Chunk Indexing
– Persistent Free File Space Tracking
We’re pleased to announce the release of HDF5 1.10.0-alpha0.
HDF5 1.10.0, planned for release in Spring, 2016, is a major release containing many new features. On January 6, 2016 we announced the release of the first alpha version of the software.
The alpha0 release contains some (but not all) of the features that will be in HDF5 1.10.0. The Single Writer/Multiple Reader and Virtual Data Set features, below, are both contained in this alpha release as are scalable chunk indexing and persistent free file space tracking. More features, such as enhancements to parallel HDF5 and support for compressing contiguous datasets will be added in upcoming alpha releases.
NERSC’s Cray Sonexion system provides data storage for its Mendel scientific computing cluster.
In my previous blog post, I discussed the need for parallel I/O and a few paradigms for doing parallel I/O from applications. HDF5 is an I/O middleware library that supports (or will support in the near future) most of the I/O paradigms we talked about.
In this blog post I will discuss how to use HDF5 to implement some of the parallel I/O methods and some of the ongoing research to support new I/O paradigms. I will not discuss pros and cons of each method since we discussed those in the previous blog post.
But before getting on with how HDF5 supports parallel I/O, let’s address a question that comes up often, which is,
“Why do I need Parallel HDF5 when the MPI standard already provides an interface for doing I/O?”
The current improvement of using collective I/O to reduce the number of independent processes accessing the file system helped to improve the metadata reads for cgp_open substantially, yielding 100-1000 times faster execution times over the previous implementation....
Elena Pourmal and Quincey Koziol - The HDF Group
UPDATE: Check our support pages for the newest version of HDF5-1.10.0.
Concurrent Access to an HDF5 File: Single Writer / Multiple Reader (SWMR)
Virtual Dataset (VDS)
Scalable Chunk Indexing
Persistent Free Filespace Tracking
Collective Metadata I/O
Integration of Java HDF5 JNI into HDF5
Many changes have been made to the HDF5 configuration
Unfortunately, parallel HDF5 enhancement has been postponed
This version contains a fix for an issue which occurred when building HDF5 within the source code directory.
Check our downloads page for more information. We are still on target for releasing HDF5-1.10.0 next week, let us know if you have any comments!
The HDF Group is committed to meeting our users' needs and expectations for...
What costs applications a lot of time and resources rather than doing actual computation? Slow I/O. It is well known that I/O subsystems are very slow compared to other parts of a computing system. Applications use I/O to store simulation output for future use by analysis applications, to checkpoint application memory to guard against system failure, to exercise out-of-core techniques for data that does not fit in a processor’s memory, and so on. I/O middleware libraries, such as HDF5, provide application users with a rich interface for I/O access to organize their data and store it efficiently. A lot of effort is invested by such I/O libraries to reduce or completely hide the cost of I/O from applications.
Parallel I/O is one technique used to access data on disk simultaneously from different application processes to maximize bandwidth and speed things up. There are several ways to do parallel I/O, and I will highlight the most popular methods that are in use today.
Blue Waters supercomputer at the National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign campus. Blue Waters is supported by the National Science Foundation and the University of Illinois.
First, to leverage parallel I/O, it is very important that you have a parallel file system;
Perhaps the original producers of “big data,” the oil & gas (O&G) industry held its eighth annual High-Performance Computing (HPC) workshop in early March. Hosted by Rice University, the workshop brings in attendees from both the HPC and petroleum industries.
Jan Odegard, the workshop organizer, invited me to the workshop to give a tutorial and short update on HDF5.
The workshop (#oghpc) has grown a great deal during the last few years and now has more than 500 people attending, with preliminary attendance numbers for this year’s workshop over 575 people (even in a “down” year for the industry). In fact, Jan’s pushing it to a “conference” next year, saying, “any workshop with more attendees than Congress is really a conference.” But it’s still a small enough crowd and venue that most people know each other well, both on the Oil & Gas and HPC sides.
The workshop program had two main tracks, one on HPC-oriented technologies that support the industry, and one on oil & gas technologies and how they can leverage HPC. The HPC track is interesting, but mostly “practical” and not research-oriented, unlike, for example, the SC technical track. The oil & gas track seems more research-focused, in ways that can enable the industry to be more productive.
The HDF Group’s mission is to provide high quality software for managing large complex data, to provide outstanding services for users of these technologies, and to insure effective management of data throughout the data life cycle....