Putting some Spark into HDF-EOS

…we focus on how far we can push our personal computing devices with Spark. It consists of 7,850 HDF-EOS5 files covering 27 years and totals about 120 GB. We use a driver script, which reads a dataset of interest from each file in the collection, computes per-file quantities of interest, and gathers them in a CSV file for visualization. The processing time on our reference tablet machine for 3.5 years of data using 4 logical processors was about 10 seconds.

HDF5 Data Compression Demystified #1

Elena Pourmal, The HDF Group What happened to my compression? One of the most powerful features of HDF5 is the ability to compress or otherwise modify, or “filter,” your data during I/O. By far, the most common user-defined filters are ones that perform data compression.  As you know, there are many compression options. There are

Parallel I/O – Why, How, and Where to?

Mohamad Chaarawi, The HDF Group First in a series: parallel HDF5 What costs applications a lot of time and resources rather than doing actual computation?  Slow I/O.  It is well known that I/O subsystems are very slow compared to other parts of a computing system.  Applications use I/O to store simulation output for future use

Answering biological questions using HDF5 and physics-based simulation data

David Dotson, doctoral student, Center for Biological Physics, Arizona State University; HDF Guest Blogger Recently I had the pleasure of meeting Anthony Scopatz for the first time at SciPy 2015, and we talked shop. I was interested in his opinions on MDSynthesis, a Python package our lab has designed to help manage the complexity of raw and derived

Python & HDF5 – A Vision

Anthony Scopatz, Assistant Professor at the University of South Carolina, HDF guest blogger “Python is great and its ecosystem for scientific computing is world class. HDF5 is amazing and is rightly the gold standard for persistence for scientific data. Many people use HDF5 from Python, and this number is only growing due to pandas’ HDFStore.

Parallel I/O with HDF5

Mohamad Chaarawi, The HDF Group Second in a series: Parallel HDF5 In my previous blog post, I discussed the need for parallel I/O and a few paradigms for doing parallel I/O from applications. HDF5 is an I/O middleware library that supports (or will support in the near future) most of the I/O paradigms we talked

HDF5 and The Big Science of Nuclear Stockpile Stewardship

DOE has continued to partner with The HDF Group, supporting development of HDF5 through two generations of computing; sponsoring this development has benefited the entire HDF5 user community. Today, DOE supports current HDF5 R&D to ensure that the data challenges of third generation exascale computing …

HDF5 and .NET: One step back, two steps forward

… enables the creation of new APIs, be it a more specific one or a new higher level API. All this is achieved in a maintainable, .NET-conformant manner, while enabling .NET developers to be creative and efficient with HDF5.

HDFql – the new HDF tool that speaks SQL

HDFql offers a language similar to SQL for HDF5. By providing a simpler/cleaner interface, HDFql aims to ease scientific computing and big data management.

Scroll to Top