hdf images hdf images

HDF5 Tutorial:   Introductory Topics
Creating a Dataset

Contents:


What is a Dataset?

A dataset is a multidimensional array of data elements, together with supporting metadata. To create a dataset, the application program must specify the location at which to create the dataset, the dataset name, the datatype and dataspace of the data array, and the property lists.

Datatypes

A datatype is a collection of properties, all of which can be stored on disk, and which, when taken as a whole, provide complete information for data conversion to or from that datatype.

There are two categories of datatypes in HDF5:

Figure 5.1 shows the HDF5 pre-defined datatypes. Some of the HDF5 predefined atomic datatypes are listed in Figures 5.2a and 5.2b.

In this tutorial, we consider only HDF5 predefined integers.

For further information on datatypes, see The Datatype Interface (H5T) in the HDF5 User's Guide, in addition to the Datatypes Advanced Tutorial topic.

Fig 5.1   HDF5 datatypes


                                          +--  integer
                                          +--  floating point
                        +---- atomic  ----+--  date and time
                        |                 +--  character string
       HDF5 datatypes --|                 +--  bitfield
                        |                 +--  opaque
                        |
                        +---- compound

Fig. 5.2a   Examples of HDF5 predefined datatypes
Datatype Description
H5T_STD_I32LE Four-byte, little-endian, signed, two's complement integer
H5T_STD_U16BE Two-byte, big-endian, unsigned integer
H5T_IEEE_F32BE Four-byte, big-endian, IEEE floating point
H5T_IEEE_F64LE Eight-byte, little-endian, IEEE floating point
H5T_C_S1 One-byte, null-terminated string of eight-bit characters

Fig. 5.2b   Examples of HDF5 predefined native datatypes
Native Datatype Corresponding C or FORTRAN Type
C:  
H5T_NATIVE_INT int
H5T_NATIVE_FLOAT float
H5T_NATIVE_CHAR char
H5T_NATIVE_DOUBLE double
H5T_NATIVE_LDOUBLE long double
FORTRAN:  
H5T_NATIVE_INTEGER integer
H5T_NATIVE_REAL real
H5T_NATIVE_DOUBLE double precision
H5T_NATIVE_CHARACTER character

Datasets and Dataspaces

A dataspace describes the dimensionality of the data array. A dataspace is either a regular N-dimensional array of data points, called a simple dataspace, or a more general collection of data points organized in another manner, called a complex dataspace. Figure 5.3 shows HDF5 dataspaces. In this tutorial, we only consider simple dataspaces.

Fig 5.3   HDF5 dataspaces


                         +-- simple
       HDF5 dataspaces --|
                         +-- complex

The dimensions of a dataset can be fixed (unchanging), or they may be unlimited, which means that they are extensible. A dataspace can also describe a portion of a dataset, making it possible to do partial I/O operations on selections.

Property Lists

Property lists are a mechanism for modifying the default behavior when creating or accessing objects. For more information on property lists see the Property List topic in the Advanced Tutorial.

The following property lists can be specified when creating a dataset: