Joe Lee, The HDF Group
Sprint has recently hit the airwaves with a promotion claiming that they will cut your data bill in half. But there’s no free lunch in this connected world we live in. Unlimited data plans always come with a steep price tag.
While the internet has been around awhile, there has recently been an explosion of data – email, the World Wide Web, social media, cloud computing, mobile apps for everything, and Big Data. At the same time, the overall global population of people using the internet has skyrocketed, as has the “Internet of Things.” Getting around can be a challenge.
The overcrowded and congested internet will continue to throw more data on us. Consequently, getting the right amount of the right data can also be a great challenge. When it’s delivered over the internet, getting the right amount of data also helps ensure that your data delivery time will be dramatically shortened, and your data delivery costs minimized.
A single HDF file can store an unlimited amount of data – and HDF’s power can easily be harnessed by savvy data producers. One good example is the NASA EOS, or Earth Observing System mission, which collects and stores climate and other earth science indicator readings in HDF formats from multiple sensors at points all over the globe, all through the day – 24/7, year after year. With daily HDF archive growth at 6.4 terabytes and user distribution at 27.9 terabytes per day, this is a lot of data. In its Big Data format, this climate information will be needed for years to come for scientists to monitor and evaluate the effects of global warming. But the information could be valuable to other researchers and forecasters as well – those without a Cray supercomputer in their back room.
Delivering large amounts of archived data as is, over the internet, is so 1990’s – data delivery time alone can far exceed the time for data analysis on your laptop. In the extreme case, shipping the entire hard disk by FedEx could be faster than delivering the data over the network. Imagine Netflix giving up streaming movies, and sending DVD/BlueRay disks via US Postal Service! Even now, with dropbox and sky drives at our disposal, Big Data still may be altogether impossible to download due to its size. The thing is, this may be overkill – you may only need a small fraction of the data anyway!
OPeNDAP is a brilliant solution that can deliver a cup of water when you need it from the Niagara Falls of HDF data (Figure 1, above). OPeNDAP’s website says:
- OPeNDAP is a framework that simplifies all aspects of scientific data networking.
- OPeNDAP provides software which makes local data accessible to remote locations regardless of local storage format.
- OPeNDAP also provides tools for transforming existing applications into OPeNDAP clients (i.e., enabling them to remotely access OPeNDAP served data).
- OPeNDAP software, like HDF software, is freely available.
It has a long history of serving up the World Wide Web of data, and the number of data formats that OPeNDAP can support has grown significantly over time.
Large, rich and complex collections of HDF data can be filtered and viewed with the help of OPeNDAP. HDF data can be provided in manageable servings, on demand, in real time, inexpensively, even on the user’s desktop or mobile device.
OPeNDAP and The HDF Group have a long history of working together and now OPeNDAP, powered by HDF handlers underneath, is the backbone of the slick new NASA Earthdata Search (now available as a beta release). If you try the search service, you will notice the OPeNDAP icon that says, “Supports spatial and parameter subsetting using OPeNDAP” as shown in the picture below (Figure 2).
In addition to subsetting, or delivering data in manageable servings, HDF handlers can meet the key CF Metadata Conventions (Climate and Forecast) on-demand without modifying original HDF data. Meeting the CF conventions helps easy-to-use netCDF visualization tools such as NASA’s Panoply and Unidata’s IDV to analyze data on the world map. I personally view this effort as the perfect alignment for a lunar eclipse: enormous data creation by the HDF-sun, user-friendly interface realization by the netCDF-earth, and effective delivery of data by the OPeNDAP-moon.
OPeNDAP began as the Distributed Oceanographic Data Systems (DODS) project before the time the World Wide Web technology was announced in 1993. In its infancy, OPeNDAP did not anticipate the growth of earth science data – it has now evolved to meet today’s data demands, and it will surely continue to evolve. Nowadays, The HDF Group works closely with OPeNDAP to improve performance and scalability to meet the Big Data community’s needs. In addition, OPeNDAP’s new protocol, DAP4, has been developed to accommodate many advanced features that HDF5 can provide – enumeration, 64-bit integer, opaque data types, groups, and more.
The key blocker for many users in adopting OPeNDAP technology has been the complexity of its installation. OPeNDAP can serve many input file formats – HDF4, HDF5, netCDF, GeoTIFF, Database, etc. – and produce output in many forms such as ASCII, netCDF, XML, and JSON. OPeNDAP had modularized software for each format, but modules among them could interfere with each other if they were not configured properly. Fortunately, OPeNDAP recently made a giant step forward to resolve this by consolidating all modules into one and distributing it all together.
While OPeNDAP can deliver subsets as little as one byte of HDF data at a time, OPeNDAP users also needed the ability to aggregate data on demand. OPeNDAP’s enhanced aggregation capability significantly achieves this in the latest release. In practical terms, this gives users the ability to pull full episodes of Star Wars movies in a single request and stream it.
All data available on the internet should be addressable and deliverable in manageable pieces to help users focus on the data that they need. Like choosing your own custom mobile data plan, by giving users full control of data from the server, OPeNDAP serves up that cupful of Niagara Falls on demand, efficiently and inexpensively.
Try out the NASA Earthdata Search (beta) service to get the feel of what OPeNDAP can do.
For more information, please see:
[1] http://hdfeos.org/software/hdf4_handler.php
[2] http://hdfeos.org/software/hdf5_handler.php