RDD

Putting some Spark into HDF-EOS

April 16, 2015

…we focus on how far we can push our personal computing devices with Spark. It consists of 7,850 HDF-EOS5 files covering 27 years and totals about 120 GB. We use a driver script, which reads a dataset of interest from each file in the collection, computes per-file quantities of interest, and gathers them in a CSV file for visualization. The processing time on our reference tablet machine for 3.5 years of data using 4 logical processors was about 10 seconds.

From HDF5 Datasets to Apache Spark RDDs

March 12, 2015

… HDF% and Spark: Balancing the workload among tasks is a concern in any parallel environment. However, that does not mean that all datasets have to be the same size. HDF5 can help with partial I/O: Instead of reading entire datasets, one could just read hyperslabs or other selections. Sampling is…

How likely are you to recommend The HDF Group's products and services to your friends, colleagues, and peers?

Comments

This field is for validation purposes and should be left unchanged.