admin123, Author at The HDF Group

Worried about your unlimited data plan bills? Cut them with OPeNDAP

Large, rich and complex collections of HDF data can be filtered and viewed with the help of OPeNDAP. HDF data can be provided in manageable servings, on demand in real time, inexpensively, even on the user's desktop or mobile device....

May 15, 2015

The HDF5 “Value Proposition” for the Fusion Data Lifecycle

When storing data, the rich, portable metadata capabilities, including directed graph structures (e.g., hierarchies), complex attributes, and inter-object references make HDF5 a superior choice for maintaining the bond between data and metadata at the lowest level. Community involvement is an essential part of the HDF Group’s mission: It is vital to sustaining the business and is our brain trust when making decisions about changes to HDF5, setting priorities, and adding new features. ...

May 7, 2015

HDF5 Data Compression Demystified #1

Elena Pourmal, The HDF Group What happened to my compression? One of the most powerful features of HDF5 is the ability to compress or otherwise modify, or “filter,” your data during I/O. By far, the most common user-defined filters are ones that perform data compression. As you know, there are many compression options. There are filters provided by the HDF5 library (“predefined filters,”) which include several types of filters for data compression, data shuffling and checksum. Users can implement their own “user-defined filters” and employ them with the HDF5 library. [caption id="attachment_10741" align="alignright" width="300"] Cars in a 1973 Philadelphia junkyard – image from National Archives and Records Administration[/caption] While the programming model and usage of the compression filters is straightforward, it is possible for...

April 23, 2015

Putting some Spark into HDF-EOS

...we focus on how far we can push our personal computing devices with Spark. It consists of 7,850 HDF-EOS5 files covering 27 years and totals about 120 GB. We use a driver script, which reads a dataset of interest from each file in the collection, computes per-file quantities of interest, and gathers them in a CSV file for visualization. The processing time on our reference tablet machine for 3.5 years of data using 4 logical processors was about 10 seconds....

April 16, 2015

Mohamad Chaarawi, The HDF Group

First in a series: parallel HDF5

What costs applications a lot of time and resources rather than doing actual computation? Slow I/O. It is well known that I/O subsystems are very slow compared to other parts of a computing system. Applications use I/O to store simulation output for future use by analysis applications, to checkpoint application memory to guard against system failure, to exercise out-of-core techniques for data that does not fit in a processor’s memory, and so on. I/O middleware libraries, such as HDF5, provide application users with a rich interface for I/O access to organize their data and store it efficiently. A lot of effort is invested by such I/O libraries to reduce or completely hide the cost of I/O from applications.

Parallel I/O is one technique used to access data on disk simultaneously from different application processes to maximize bandwidth and speed things up. There are several ways to do parallel I/O, and I will highlight the most popular methods that are in use today.

Blue Waters supercomputer at the National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign campus. Blue Waters is supported by the National Science Foundation and the University of Illinois.

First, to leverage parallel I/O, it is very important that you have a parallel file system;

April 9, 2015

HDF5 for the Web – HDF Server

HDF Group has just announced “HDF Server” - a freely available service that enables remote access to HDF5 content using a RESTful API. In our scenario, using HDF Server, we upload our Monopoly simulation results to the server and then interested parties can make requests for any desired content to the server - no file size issues, no downloading entire files...

April 2, 2015

HDF5 as a zero-configuration, ad-hoc scientific database for Python

Andrew Collette, Research Scientist with IMPACT, HDF Guest Blogger "...

March 25, 2015

Ted Habermann, The HDF Group

Fifteen years ago, NASA selected HDF as the format for the data products produced by NASA Satellites for the NASA Earth Observing System (EOS).

The HDF Earth Science Program is well aware of this important legacy. We focus on continuing support of U.S. environmental satellite programs (NASA Earth Observing System and Joint Polar Satellite System, JPSS), on-going quality assurance of the HDF libraries and helping data users access and understand products written in HDF. The HDF-EOS Information Center (#hdfeos) includes code examples in MATLAB, IDL, NCL, and Python, many driven by user questions. The site also provides information on other HDF tools.

The Moderate Resolution Imaging Spectro-radiometer, MODIS, can see the Earth in true color as it appears from the satellite Terra. It also measures an unprecedented number of parameters related to global change including ocean plant life, cloud properties, atmospheric particulates (aerosols) and land surface change. Image courtesy NASA and the MODIS instrument team.

NASA’s decision ensured a role for HDF in Earth Science and set an important precedent. HDF developers, along with the U.S. and other Earth Observing nations, developed a clear distinction between Earth Science Data Objects (grids, swaths, profiles…); the metadata required to describe them; and the HDF objects (datasets, groups, attributes, etc.) that make them up.

The critical realization was that communities like EOS needed conventions for describing Earth Science objects to enable using and sharing those objects. These conventions, termed HDF-EOS, have been used successfully in hundreds of NASA products that can be easily shared among multiple users using standard tools.

Many other Earth Science communities have used the powerful combination of conventions and HDF.

March 20, 2015

HDF at the 2015 Oil & Gas High Performance Computing Workshop

Quincey Koziol, The HDF Group

Photo from nasa.gov

Perhaps the original producers of “big data,” the oil & gas (O&G) industry held its eighth annual High-Performance Computing (HPC) workshop in early March. Hosted by Rice University, the workshop brings in attendees from both the HPC and petroleum industries.

Jan Odegard, the workshop organizer, invited me to the workshop to give a tutorial and short update on HDF5.

The workshop (#oghpc) has grown a great deal during the last few years and now has more than 500 people attending, with preliminary attendance numbers for this year’s workshop over 575 people (even in a “down” year for the industry). In fact, Jan’s pushing it to a “conference” next year, saying, “any workshop with more attendees than Congress is really a conference.” But it’s still a small enough crowd and venue that most people know each other well, both on the Oil & Gas and HPC sides.

The workshop program had two main tracks, one on HPC-oriented technologies that support the industry, and one on oil & gas technologies and how they can leverage HPC. The HPC track is interesting, but mostly “practical” and not research-oriented, unlike, for example, the SC technical track. The oil & gas track seems more research-focused, in ways that can enable the industry to be more productive.

I gave an hour and a half tutorial on developing and tuning parallel HDF5 applications, which

March 18, 2015

From HDF5 Datasets to Apache Spark RDDs

Gerd Heber, The HDF Group

Editor’s Note: Since this post was written in 2015, The HDF Group has developed HDF5 Connector for Apache Spark™, a new product that addresses the challenges of adapting large scale array-based computing to the cloud and object storage while intelligently handling the full data management life cycle. If this is something that interests you, we’d love to hear from you.

“I would like to do something with all the datasets in all the HDF5 files in this directory, but I’m not sure how to proceed.”

If this sounds all too familiar, then reading this article might be worth your while. The accepted general answer is to write a Python script (and use h5py [1]), but I am not going to repeat here what you know already. Instead, I will show you how to hot-wire one of the new shiny engines, Apache Spark [2], and make a few suggestions on how to reduce the coding on your part while opening the door to new opportunities.

But what about Hadoop? There is no out-of-the-box interoperability between HDF5 and Hadoop. See our BigHDF FAQs [3] for a few glimmers of hope. Major points of contention remain such as HDFS’s “blocked” worldview and its aversion to relatively small objects, and then there is HDF5’s determination to keep its smarts away from prying eyes. Spark is more relaxed and works happily with HDFS, Amazon S3, and, yes, a local file system or NFS. More importantly, with its Resilient Distributed Datasets (RDD) [4] it raises the level of abstraction and overcomes several Hadoop/MapReduce shortcomings when dealing with iterative methods. See reference [5] for an in-depth discussion.

Figure 1. A simple HDF5/Spark scenario

As our model problem (see Figure 1), consider the following scenario:

March 12, 2015

« Previous Page — Next Page »

Author: admin123

Worried about your unlimited data plan bills? Cut them with OPeNDAP

The HDF5 “Value Proposition” for the Fusion Data Lifecycle

HDF5 Data Compression Demystified #1

Putting some Spark into HDF-EOS

Parallel I/O – Why, How, and Where to?

Mohamad Chaarawi, The HDF Group

First in a series: parallel HDF5

Parallel I/O is one technique used to access data on disk simultaneously from different application processes to maximize bandwidth and speed things up. There are several ways to do parallel I/O, and I will highlight the most popular methods that are in use today.

HDF5 as a zero-configuration, ad-hoc scientific database for Python

HDF Earth Science Program

Ted Habermann, The HDF Group

Fifteen years ago, NASA selected HDF as the format for the data products produced by NASA Satellites for the NASA Earth Observing System (EOS).

HDF at the 2015 Oil & Gas High Performance Computing Workshop

Quincey Koziol, The HDF Group

Latest Posts

Release of HDF5 1.14.4 (Newsletter #202)

Updates on portal.hdfgroup.org

Release of HDF 4.2.16-2, a patch release (Newsletter #195)

Latest Tweets

Connect

Get Started