hdf images hdf images

This web site is no longer maintained (but will remain online).
Please see The HDF Group's new Support Portal for the latest information.

PSH5X Frequently Asked Questions

HDF Group - PSH5X FAQ PSH5X logo

Introduction

1.01 Where do I get help?

In PowerShell, the Get-Help cmdlet is your one-stop shop for help about cmdlets and topics. All PSH5X cmdlets come with some form of help. For example, to get help on New-H5Drive type help New-H5Drive or New-H5Drive -?. You can get a list of HDF5 related topics by typing help about_H5* .

1.02 What's the difference between between scripting languages and shell languages?

I keep returning to this question about every 3 months and an answer still eludes me. The following comment by Bruce Payette has been a steady source of consolation (Windows PowerShell in Action, 2nd Edition, Manning 2011, page 7):

In the end, there's no hard-and-fast distinction between a shell language and a scripting language. Some of the features that make a good scripting language result in a poor shell user experience. Conversely, some of the features that make for a good interactive shell experience can interfere with scripting. Because PowerShell's goal is to be both a good scripting language and a good interactive shell, balancing the tradeoffs between user experience and scripting authoring was one of the major language design challenges.

1.03 What datatypes are supported?

The ultimate goal is to support all datatypes. When we say 'support', we mean that data of that type can be represented in memory and on disk, and be read an written. The flexible JSON notation for HDF5 datatypes (see help about_H5Datatypes) lets one create HDF5 datasets, linked datatypes, and attributes of any HDF5 datatype. However, not all of them can be currently read or written. The process to get to a more complete support will be very much demand-driven.

The table below gives an overview of what's supported in the current version.

HDF5 Type Class HDF5 Dataset HDF5 Attribute Notes
H5T_INTEGER yes yes 1, 2, 4 or 8 byte, BE or LE, signed/unsigned
H5T_FLOAT yes yes 4 or 8 byte, IEEE
H5T_STRING yes yes Fixed- and variable-length, ASCII and UTF-8
H5T_ENUM yes yes Must be derived from 1, 2, 4 or 8 byte, BE or LE, signed/unsigned integers.
H5T_BITFIELD yes yes 1, 2, 4 or 8 byte
H5T_OPAQUE yes no
H5T_REFERENCE yes yes For attributes, only object references are supported.
H5T_COMPOUND yes no Currently, members of class H5T_COMPOUND and H5T_VLEN are not supported. Members of class H5T_ARRAY are supported, if they are derived from a primitive type (e.g., integers, floats, strings etc.)
H5T_ARRAY yes no Currently, only array types derived from a primitive type (e.g., integers, floats, strings etc.) are supported.
H5T_VLEN yes no Currently, only variable-length sequence types derived from a primitive type (e.g., integers, floats, strings etc.) are supported.

1.04 How do HDF5 compound datatypes "work" in PSH5X?

There are two parts to this question.

Reading data elements of an HDF5 compound datatype

The workflow for reading data elements from an HDF5 dataset is this:

  1. Parse the HDF5 datatype (in the file)
  2. Create a C# class whose instances (.NET objects) can be used to represent the data elements in memory
  3. Compile the code and load the .NET assembly into the running session
  4. Create a .NET array of the right size and shape
  5. Read the HDF5 dataset and intialize the array with the values read from disk
  6. Return the .NET array to the user

There are several important details to this process. First, the code generation currently does not support deeply nested compound types. This is a matter of finding a good balance between demand (i.e., compound types that people really use) and having a maintainable code generator. Second, not all compound member names can be used as names of members in a .NET class. There are decorators such as PSNoteProperty, but we haven't finalized our approach. In the meantime, we use the type codes from the Python array module (e.g., H for a 16-bit unsigned integer) with a suffix indicating the rank of the member in the HDF5 compound. Let's look at an example. The compound type with this JSON definition:

{
    "Class": "Compound",
    "Size": 16,
    "Members":
    {
        "a": [0, "H5T_STD_I32BE"],
        "b": [4, "H5T_IEEE_F32BE"],
        "c": [8, "H5T_IEEE_F64BE"]
    }
}     
        

will be represented as follows:

using System;
public class ifd
{
    public System.Int32  i0;
    public System.Single f1;
    public System.Double d2;
    public ifd() { }
    public ifd(System.Int32 param0, System.Single param1, System.Double param2)
    { i0 = param0; f1 = param1; d2 = param2; }
}           
        

It'd be nice if there was an array package like NumPy for .NET. There appears to be a working 32-bit version, but not a 64-bit version. If you have any suggestions, please let us know!

The Get-H5DatasetValue cmdlet currently does not support reading a subset of members of a compound type and it does not support type conversion (i.e., a single-prescision floating point number is read as just that; it doesn't convert it into a double). Both of these features will be supported in the release version of PSH5X.

Writing data elements of an HDF5 compound datatype

The workflow for writing data elements from an HDF5 dataset is this:

  1. Parse the HDF5 datatype in the file
  2. Reflect on the .NET type of the array elements in memory and determine if the member types can be converted to the matching member types on disk
  3. If successful, write the array to the HDF5 dataset

Writing is simpler than reading, since no dynamic code generation is required. The Add-Type cmdlet gives the user ultimate control over the in-memory representation (and constructors) of the data elements. (See the tutorial for examples.)

The Set-H5DatasetValue cmdlet currently does not support writing just a subset of compound members. This will be supported in the release version of PSH5X.

To create an array of in-memory representations of an HDF5 datatype you have at least two options.

  1. Use Add-Type to add a .NET type followd by a New-Object to construct the array
  2. Use the New-H5Array cmdlet. Its first argument is the JSON type string and the array shape (dimensions) is its second argument.

The Windows Platform

2.01 How do I get data from HDF5 into Excel?

Assume that we have a 5553 (rows) by 86 (colums) twodimensional dataset of floating point numbers at h5:/my/floats that we would like to load into Excel. There's a simple four step process to get this done.

  1. Create an Excel COM object
  2. Add a blank workbook and select the target range
  3. Pump the values into the range
  4. Clean up

The PowerShell skeleton is shown below. Note that you have to work out the corners of the range in Excel notation, e.g., A1-CH5554, which is well documented (check the links following the code).

# create an Excel COM object
$excel = New-Object -ComObject Excel.Application
$excel.Visible = $True

# add a workbook and select the target range
$workbook = $excel.Workbooks.Add() 
$range = $workbook.ActiveSheet.Range('A1','CH5554')

# pump the values in
$range.Value2 = Get-H5DatasetValue h5:/my/floats

# cleanup
$excel.Application.DisplayAlerts = $False
$excel.Quit()
Get-Process Excel | Stop-Process
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($excel)
        

Related information:

2.02 Can I use ILNumerics in PowerShell?

Short answer: Yes.

Long answer: ILNumerics is a high performance math library for applications. Use the Add-Type cmdlet to load the ILNumerics assembly. You can then create ILNumerics .NET objects by casting or with the New-Object cmdlet.

Use a package, such as PowerShell Type Accelerators to make to code a littler easier to read and type.

Add-Type -Path E:\ILNumerics\ILNumerics_2.11.4464.29952\bin64\ILNumerics64.dll

Import-Module TypeAccelerator

Add-TypeAccelerator ilm   ILNumerics.ILMath

Add-TypeAccelerator ilf32 ILNumerics.ILArray[single]

# convert the value of an HDF5 dataset into an ILNumerics.ILArray<Single>

$il = [ilf32] (Get-H5DatasetValue 'aura:\HDFEOS\SWATHS\HIRDLS\Data Fields\Temperature')

# get the last column

$il[':;end']
        

The same is pretty much true for any .NET library. If such a library depends on HDF5, e.g., HDF5DotNet, make sure that it was built against the same version of the HDF5 library as PSH5X.

Common HDF5 Tasks

3.01 How do I save an XML file in HDF5?

There are several different ways of accomplishing that. Most are a variation on the following theme: Let's store the types.ps1xml file in a scalar string attribute of the HDF5 root group. We can store it as a variable-length or fixed-length string. By default, attributes are stored in compact form where their size must not exceed 64K. By comparision, a scalar dataset is not subject to that restriction.

$x = [string] (Get-Content "$($PSHOME)\types.ps1xml")

# variable-length, UTF-8 encoded string

New-H5Attribute h5tmp:\ vlen_str $x ustring

# in dense storage (default) the attribute size must not exceed 64K

$y = $x.Substring(0,64000)

New-H5Attribute h5tmp:\ fixed_str $y "ustring$($y.Length)"

New-H5Dataset h5tmp:\fixed_str "ustring$($x.Length)" -Scalar

Set-H5datasetValue h5tmp:\fixed_str $x            
        

Attributes can be larger than 64K if an object's attribute storage is dense. This property must be set at file/object creation time.

3.02 How do I save a binary file in HDF5?

There are several different ways of accomplishing that. Most are a variation on the following theme: Let's store the Notepad executable, notepad.exe in a scalar (singleton) HDF5 dataset of an HDF5 opaque type.

# get the file bytes
[byte[]] $x = Get-Content -Encoding byte -Path "$($env:windir)\System32\notepad.exe" -ReadCount 0

# create a scalar HDF5 dataset of an opaque type of the right size (193536 in Windows 7 64-bit)
New-H5Dataset h5tmp:\notepad.exe "opaque$($x.Count)['notepad.exe']" -Scalar

# pump the bytes in
Set-H5DatasetValue h5tmp:\notepad.exe (,$x)
        

The (,$x) on the last line is not a typo. Set-H5DatasetValue compares the size of the dataset with the value supplied. Since we are dealing with a scalar dataset, Set-H5DatasetValue expects a single value. Supplying $x as the value would create a size mismatch error, since it's an array of length greater than one. We must us the comma operator to turn it into a single element array whose only element is an array itself.

- - Last modified: 13 October 2016