Spark Archives - The HDF Group - ensuring long-term access and usability of HDF data and supporting users of HDF technologies

The HDF 2015 Workshop at the ESIP Summer Meeting

September 16, 2015

Lindsay Powers, The HDF Group The 2015 HDF workshop held during the ESIP Summer Meeting was a great success thanks to more than 40 participants throughout the four sessions. The workshop was an excellent opportunity for us to interact with HDF community members to better understand their needs and introduce them to new technologies. You […]

ESIP Summer Meeting – HDF Workshop and Town Hall

June 15, 2015

HDF Group is hosting a one-day workshop at the upcoming Federation for Earth Science Information Partners (ESIP) Summer Meeting in Asilomar, CA on July 14th. Please join us to learn about new HDF tools, projects and perspectives. There will also be an HDF Town Hall meeting on Wednesday afternoon July 15th

Putting some Spark into HDF-EOS

April 16, 2015

…we focus on how far we can push our personal computing devices with Spark. It consists of 7,850 HDF-EOS5 files covering 27 years and totals about 120 GB. We use a driver script, which reads a dataset of interest from each file in the collection, computes per-file quantities of interest, and gathers them in a CSV file for visualization. The processing time on our reference tablet machine for 3.5 years of data using 4 logical processors was about 10 seconds.

From HDF5 Datasets to Apache Spark RDDs

March 12, 2015

… HDF% and Spark: Balancing the workload among tasks is a concern in any parallel environment. However, that does not mean that all datasets have to be the same size. HDF5 can help with partial I/O: Instead of reading entire datasets, one could just read hyperslabs or other selections. Sampling is…