Blog

HDF5 and .NET: One step back, two steps forward

Gerd Heber, The HDF Group and Haymo Kutschbach,* ILNumerics

Metaphorically speaking, this blog post is about a frog trying to climb out of a well, a damp and unsightly corner of the HDF5 ecosystem called HDF5.NET. People who know more about its genesis tell us that it was never intended as what it became to be perceived as, an “aspirational” .NET interface for HDF5 that would one day be complete and fully supported. Be that as it may, it’s important to ask, “What can we do today to better serve the needs of the .NET community?” We believe, as the title suggests, we need to take a step back to move forward. 

Cynics might argue that it would be best if The HDF Group stopped altogether providing HDF5 APIs in languages other than C and FORTRAN. The mixed record of the HDF Group’s attempts and the success of the Python family of interfaces (PyTables, h5py, pandas) lend some credibility to that argument. Other communities have made a deliberate decision: ”So, in ZeroMQ, we aimed to make it easy to write bindings on top of the core library, and we stopped trying to make those bindings ourselves.” (Peter Hintjens, ZeroMQ, p. 334)

Our goal is not to settle this question (for HDF5), but to develop a proposal for a .NET facility on top of the core HDF5 library, which would make it easy to write more specialized or more high-level .NET APIs. The main quality attributes we have in mind are the following:

  1. Maintainability
    • Testability
    • Extensibility
    • Development distributability / communalization
  2. Interoperability
  3. Performance

Why these three?

For a software like HDF5, the overwhelming cost is in software maintenance. A software which is easier to test, easy to extend, and whose development can take place in a distributed fashion tends to cost a lot less to maintain.

When we say .NET, we tend to forget that this means at least a handful of different .NET languages, and with something like the Mono and Xamarin projects extends far beyond the Windows platform. The .NET way of being interoperable is to conform to the CLI specification, and portability across platforms is built into HDF5’s DNA.

Finally, it should be the .NET developer’s decision where to trade performance and how much. Our goal must be to deal with the specifics of the native infrastructure in a thin layer, just enough to enable direct access to the whole spectrum of options to write high performance .NET code, including the full set of available performance tools.

You might wonder why usability wasn’t on the list. The main issue is that it means probably five different things to three people. We sure know what .NET developers DON’T want. That includes being overwhelmed with the nasty internals of native languages, a maze of compiler switches, platform specifics, and native interfaces. They are used to reference an assembly and get the full API readily presented via Intellisense. A very basic .NET binding goes a long way and is THE starting point for any .NET development. That doesn’t mean we’ll never know what constitutes a usable .NET interface for HDF5, but we believe an evolutionary approach has a better chance of success than a design by committee, especially, since it is fairly uncontentious where the path should begin, and useful and usable work products lie just a few hundred yards beyond the trailhead.

We call this trailhead HDF.PInvoke. It’s a collection of PInvoke signatures and a few user-defined types and any other information related to calling any (w/ minor exceptions) HDF5 API function from managed code. It is not a new API. It rather enables the creation of new APIs, be it a more specific one or a new higher level API. All this is achieved in a maintainable, .NET-conformant manner, while enabling .NET developers to be creative and efficient with HDF5.

Together with documentation and a unit test suite, HDF.PInvoke will sit on GitHub and be available under the same license as the HDF5 library. Here’s a snippet from H5A.cs:

using System;
using System.Runtime.InteropServices;
using System.Security;

using herr_t = System.Int32;
using hid_t = System.Int32;
...

// See the typedef for message creation indexes in H5Opublic.h
using H5O_msg_crt_idx_t = System.UInt32;

namespace HDF.PInvoke
{
    public unsafe sealed class H5A
    {
        /// 
        /// Information struct for attribute
        /// (for H5Aget_info/H5Aget_info_by_idx)
        /// 
        public struct info_t
        {
            /// 
            /// Indicate if creation order is valid
            /// 
            hbool_t corder_valid;
            /// 
            /// Creation order
            /// 
            H5O_msg_crt_idx_t corder;
            /// 
            /// Character set of attribute name
            /// 
            H5T.cset_t cset;
            /// 
            /// Size of raw data
            /// 
            hsize_t data_size;
        };

        /// Delegate for H5Aiterate2() callbacks
        public delegate herr_t operator_t
            (hid_t location_id, string attr_name, info_t ainfo, object op_data);

       ///  ... 
       [DllImport(Constants.DLLFileName,
            CallingConvention = CallingConvention.Cdecl), EntryPoint = "H5Aiterate2",
        SuppressUnmanagedCodeSecurity, SecuritySafeCritical]
        public extern static herr_t iterate
            (hid_t loc_id, H5.index_t idx_type, H5.iter_order_t order,
            ref hsize_t idx, operator_t op, object op_data);

       ...
    }
}

And it gets better: You can meet us in the parking lot near the trailhead and help with testing, submit issues, write and review documentation, submit pull requests, etc.  Our first milestone will be HDF.PInvoke 1.8.16, i.e., virtually complete coverage of the HDF5 1.8.16 C-API. From there, it’s actually just a few miles to HDF.PInvoke 1.10, which brings several new features, but also breaks some familiar notions such as 32-bit handles (identifiers).

Interested? RSVP dotnet@hdfgroup.org and we’ll introduce you to the team.

HDF5.NET is moribund, long live HDF.PInvoke!

*****

* Haymo Kutschbach is the CEO of ILNumerics GmbH, the creator of ILNumerics tools for technical application development, Berlin, Germany.

 

No Comments

Leave a Comment