Blog

HDF5 Implementation in Mathematica

Scot Martin, Harvard University, HDF Guest Blogger

HDF5 storage is really interesting. To me, its format has no fixed structure, but instead is based on introspection and discovery. Seems great to me; Mathematica has its origins first in artificial intelligence, so we ought to be able to do something here.  Approaching twenty-two years with Mathematica and almost a “Hello, World!” ability in C, I decided to jump right in. Enter The HDF Group’s P/Invoke for my salvation. Here’s how we make use of it in Mathematica:

LoadNETAssembly["HDF.PInvoke.dll"]

Bang! Ready to go in Mathematica. Here’s a proof of concept for how it works:

Module[
(* The three symbols should have initial values so that there is *)
(* memory allocation when Mathematica interfaces with P/Invoke. *)
{major=0,minor=0,revision=0,return},
CompoundExpression[
(* access the C routines through P/Invoke interface with Mathematica *)
return=HDF'PInvoke'H5'getUlibversion[major,minor,revision],
(* show some output *)
Row[{"Return = ",return,". Version = ",{major,minor,revision},"."}]
]
]

 

Great, “It’s alive.”

So I developed some related material called “HDF5.Mathematica.Packages” and put it up on GitHub for everyone else to use too. There are four levels of access. “HDF5” is the low-level package for Mathematica wrappers that correspond on a one-to-one basis with HDF5 functions (http://www.hdfgroup.org/HDF5/doc/RM/). It has maximum flexibility, allowing you to do everything that you can do with the C interface, but in consequence, there remain inconvenient interfaces like memory pointers and buffers.

For comparison, the next step up is “HDF5Basic” that eliminates many of these interfaces while still keeping excellent generality.

HDF5Extend” extends some of the basic functions so that they are called and provide returns in a format similar to normal Mathematica syntax. In particular, H5DExtendRead eliminates the need to know about whether a variable is FLOAT, INTEGER, STRING and so on that are characteristic of C but not so much of Mathematica. In principle, this package adds convenience but no new functionality.

HDF5HighLevel” extends further with some higher level functions that act as small programs and work on complete HDF5 files. It has great convenience but necessarily reduced flexibility.

Below, follow some examples in each level of access that show progressively more convenience and less flexibility, meaning more specificity of purpose.

HDF5

Creating a file always gives a glorious feeling! Let’s do it with the low-level, maximum flexibility, most C-like interface:

With[
{outputFileName="file.h5"},
fileID=H5Fcreate[outputFileName,H5FuACCuTRUNC,H5PuDEFAULT,H5PuDEFAULT]
]

 

H5Fclose[fileID]

 

HDF5Basic

What’s better than creating a file? Writing some data, of course! Let’s do it with the aid of H5DBasicWrite of HDF5Basic instead of H5Dwrite of HDF5 so that we can bury a little bit of the inconvenient business of memory buffers. We’ll use a NETObject instead.

CompoundExpression[
fileID=H5Fopen["dset.h5",H5FuACCuRDWR,H5PuDEFAULT],
dataSetID=H5Dopen[fileID, "/dset",H5PuDEFAULT],
dataToWrite={{8,9,10,11,12,13},{14,15,16,17,18,19}}
]

NETBlock[
With[
{bufferToWrite=MakeNETObject[Flatten@dataToWrite,"System.Int32[]"]},
HFDBasicWrite[dataSetID,H5TuNATIVEuINT,H5SuALL,H5SuALL,H5PuDEFAULT,bufferToWrite]
]
]

 

CompoundExpression[
H5Dclose[dataSetID],
H5Fclose[fileID]
]

 

The business above of HDF5Basic to use System.Int32[] is still very un-Mathematica like. We’ll get familiar Mathematica syntax in HDF5Extend.

HDF5Extend

Here is an example of write-to-file followed by read-from-file that is fully within Mathematica syntax.  Note how HDF5ExtendWrite and HDF5ExtendRead sort out all the details of data type and buffer size. HD5ExtendOpen and H5DExtendOpen also take care of the necessary closes.

H5FExtendOpen[
{fileID=H5Fopen[dset.h5",H5FuACCuRDWR,H5PuDEFAULT]},

H5DExtendOpen[
{dataSetID=H5Dopen[fileID, "/dset", H5PuDEFAULT]},

CompoundExpression[
(*data to be written to file *)
dataToWrite={{8,9,10,11,12,13},{14,15,16,17,18,19}}
 
(* write out the data *)
H5DExtendWrite{dataSetID,H5TuNATIVEuINT,H5SuALL,H5SuALL,H5PuDEFAULT,dataToWrite],

(* read the data back in *)
H5DExtendRead[dataSetID,H5SuALL,H5PuDEFAULT]
]

] (* close H5DExtendOpen *)
] (* close H5FExtendOpen *)

 

HDF5HighLevel

The built-in HDF5 implementations of Mathematica can read the full data sets of FLOAT or INTEGER data types, for example. So, to be interesting, we should demonstrate some other things.

Below follow two examples of HDF5HighLevel implementations that do things that the HDF5 capabilities built into Mathematica cannot do, including operations with HYPERSLABS and reading data of a COMPOUND type.

Operations with HYPERSLABS

Let’s read all the data:

With[
{filesname="hdf5_test.h5,dataset="./images/pixel interlace"},
With[
(* interface with HDF5 to read the data we want *)
{imageData=ReadHyperSlab[filename,dataset,{} (* offset *),{} (* block size *)]},
(* Mathematica command to make a pretty picture*)
Image[imageData,"Byte"]
]
]

Now let’s read just some of the data as a hyperslab:

With[
{filename=hdf5_test.h5",dataset="./images/pixel interlace"},
With[
(* interface with HDF5 to read the data we want *)
{imageData=ReadHyperSlab[filename,dataset,{20,35} (* offset *),{100,175} (* block size *)]},
(* Mathematica command to make a pretty picture*)
Image[imageData,"Byte"]
]
]

hyperslab, mathematica

Observe also that the H5DExtendRead[], called by ReadHyperSlab[], automatically takes care of the HDF5 internal workings of INTEGER, FLOAT, STRING (fixed and variable), BITFIELD, ENUM, OPAQUE, COMPOUND (user provides “ByteConversionFunction”), and ARRAY.  In the current version of Mathematica, TIME, REFERENCE, and VLEN have not yet been implemented.

Examine and read a COMPOUND DATATYPE

Here is detailed information on a single compound datatype.

With[
{filename="h5ex_t_cmpd.h5",pathToObject="./DS1"},
CompoundDataTypeInformation[filename,pathToObject]
]

By way of explanation, we see that the COMPOUND data type consists of an INTEGER of type System.Int64[] of length 1, STRING of type System.IntPtr[] of length 1 (implying a variable length string), FLOAT of type System.Double[] of length 1, and FLOAT of type System.Double[] of length 1.

Let’s read in the actual contents now:

With[
{filename="h5ex_t_cmpd.h5",dataSet="./DS1"},
ReadHyperSlab[filename,dataset,{},{},"ByteConversionFunction"→(myFunction[#1,#2,#3,#4]&)]
]

In full disclosure, I did omit to share with you the details of ‘myFunction’, which I set up for the specific compound data type that we revealed above using CompoundDataTypeInformation. ‘myFunction’ includes, for example, a call to ‘System`BitConverter`ToInt64’ to convert the bytes of System.Int64[] into a number for Mathematica.  The full code is available as an example in the GitHub download. With more time in the day for me or someone else, an automated semi-intelligent routine could be easily written to automatically set up ‘myFunction’.

Another fun thing to do is to make graphs of nested compound datatypes:

With[
{filename="hdf5_test.h5",pathToObject="./arrays/ArrayOfStructures"},
CompoundDataTypeInformationTree[filename,pathToObject]
]

Thank you for reading about my activities linking HDF5 and Mathematica. This package can be developed further in an open-source community environment.

I can be reached at scot_martin at harvard.edu.

The HDF Group Editor’s Note: Thank you, Dr. Martin for a wonderful article!  Scot T. Martin is the Gordon McKay Professor of Environmental Chemistry at Harvard University, with appointments in the School of Engineering & Applied Sciences and the Department of Earth & Planetary Sciences at Harvard University, Cambridge, MA.  As the director of the Laboratory of Environmental Chemistry, Dr. Martin’s research focuses on engineering solutions to the major environmental challenges presently facing the world.

https://github.com/HDFGroup/HDF.PInvoke

No Comments

Leave a Comment