From HDF5 Datasets to Apache Spark RDDs

… HDF% and Spark: Balancing the workload among tasks is a concern in any parallel environment. However, that does not mean that all datasets have to be the same size. HDF5 can help with partial I/O: Instead of reading entire datasets, one could just read hyperslabs or other selections. Sampling is…

HDF5 Data Compression Demystified #1

Elena Pourmal, The HDF Group What happened to my compression? One of the most powerful features of HDF5 is the ability to compress or otherwise modify, or “filter,” your data during I/O. By far, the most common user-defined filters are ones that perform data compression.  As you know, there are many compression options. There are

Parallel I/O – Why, How, and Where to?

Mohamad Chaarawi, The HDF Group First in a series: parallel HDF5 What costs applications a lot of time and resources rather than doing actual computation?  Slow I/O.  It is well known that I/O subsystems are very slow compared to other parts of a computing system.  Applications use I/O to store simulation output for future use

Putting some Spark into HDF-EOS

…we focus on how far we can push our personal computing devices with Spark. It consists of 7,850 HDF-EOS5 files covering 27 years and totals about 120 GB. We use a driver script, which reads a dataset of interest from each file in the collection, computes per-file quantities of interest, and gathers them in a CSV file for visualization. The processing time on our reference tablet machine for 3.5 years of data using 4 logical processors was about 10 seconds.

HDF5 and .NET: One step back, two steps forward

… enables the creation of new APIs, be it a more specific one or a new higher level API. All this is achieved in a maintainable, .NET-conformant manner, while enabling .NET developers to be creative and efficient with HDF5.

HDF5 and The Big Science of Nuclear Stockpile Stewardship

DOE has continued to partner with The HDF Group, supporting development of HDF5 through two generations of computing; sponsoring this development has benefited the entire HDF5 user community. Today, DOE supports current HDF5 R&D to ensure that the data challenges of third generation exascale computing …

Python & HDF5 – A Vision

Anthony Scopatz, Assistant Professor at the University of South Carolina, HDF guest blogger “Python is great and its ecosystem for scientific computing is world class. HDF5 is amazing and is rightly the gold standard for persistence for scientific data. Many people use HDF5 from Python, and this number is only growing due to pandas’ HDFStore.

Scroll to Top