HDF Newsletter #5

Contents:

   To transpose or not to transpose? (We need your help)

   HDF 3.2 beta
    A. How HDF 3.2 is different from HDF 3.1
      1. SDS enhancements
      2. New conversion routines
      3. Complete new set of general purpose routines.
      4. Emulation of previous general purpose interface
      5. Inclusion of the Vset calling interface
      6. Test modules.
      7. Naming conventions
    B. Getting HDF 3.2 beta
    C. Work to do, problems to solve, and things to decide

    Calibration Tag

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

               To transpose or not to transpose?
                     (We need your help)

Background

HDF3.1 release 5 and earlier releases store SDS arrays in HDF 
files in "row major order" by default.  ("Row major order" 
refers to the fact that two-dimensional arrays are stored "a 
row at a time," each row occupying a contiguous block of 
storage.)  The C language follows this convention in its 
internal storage of arrays as well, which means that a C call 
to an HDF SDS routine like DFSDadddata simply streams the 
data as is to the HDF file.

But Fortran does not follow this convention in its internal 
storage of arrays.  Fortran stores arrays in "column major 
order."  When a Fortran program calls DFSDputdata to write an 
array to an HDF file the data must be transposed.   
Similarly, when a Fortran program calls DFSDgetdata, it 
transposes the data so that it is stored in the original 
order.

This approach seemed reasonable when we first designed the 
SDS interface, because it was a way of keeping rows and 
columns consistent, no matter which language was used.  But, 
as many users have suggested, this was not always true, 
especially when arrays of higher dimension were involved.  By 
second-guessing how users "view" arrays, HDF often imposes an 
order on them that is not intended.  Furthermore, when higher 
dimensions are involved, transposing can rearrange data in 
ways that make it very hard for a user to understand it.

A second problem caused by transposing data is that it takes 
time.  TFor large data sets this slows down reading and 
writing substantially.  Some HDF users have found that they 
must turn off transposing because they simply cannot tolerate 
the additional time it takes.

The proposed change

In a nutshell, we are proposing that the SDS interface no 
longer transpose arrays that are read from or written to HDF 
files by Fortran programs. 

The main negative implication to this, we believe, is that 
some older programs assume transposing and have code to 
accommodate for it, and these programs will have to be 
revised.  The positive implications are (1) that users know 
that what goes into an HDF file maps directly to what they 
have stored in memory, and (2) perfomance for large datasets 
is improved enormously.


What we need from you

Let us know if you care about this issue one way or the 
other.  If you want to play with a version of HDF that does 
not transpose, get the file README.notranspose in 
outgoing/hdf3.2 on the ftp server (ftp.ncsa.uiuc.edu; IP 
address 141.142.20.50).

We plan to wait and see what the reaction is from our users, 
and then we will decide whether to make this important 
change.  Our schedule calls for us to release HDF 3.2 in May 
sometime, and we might want to make the change with that 
release, if we make it.  So let us know within the next few 
weeks if you have an opinion about it.


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


                  HDF 3.2 beta 

On March 1, we put a new version of HDF on the ftp server 
(ftp.ncsa.uiuc.edu; IP address 141.142.20.50) in the 
directory HDF/HDF3.2beta.  This version is a result of a 
joint software development effort between NCSA and the 
Information Technologies Institute of Singapore, with a lot 
of additional support from Phillips Petroleum.  It represents 
a major overhaul of HDF, and gives a good foundation for 
future HDF development.

We are working towards a full release of HDF 3.2 sometime in 
May.  The beta release has been implemented so far only on 
the Cray, Sun (SPARC), and Decstation 5000.  We expect to 
have it ported to the other platforms by the time of the full 
release.

PLEASE, ALL YOU ADVENTUROUS SOULS, GRAB IT AND PLAY WITH IT, 
AND LET US KNOW WHAT NEEDS FIXING.  Shiming Xu will be our 
contact person for this: sxu@ncsa.uiuc.edu (217) 244-3830.


A. How HDF 3.2 is different from HDF 3.1
----------------------------------------

A1. SDS enhancements.

The previous SDS interfaces allow storage of 32-bit floats 
and Cray native mode floats only.  The HDF 3.2 SDS interface 
has been expanded to handle  8-bit, 16- bit, and 32-bit 
integers, 32-bit and 64-bit floats, and native mode for all 
of these..  In this beta version, 64-bit floats have not yet 
been tested, but they should be done by the full release.

In the C interface, integers can be signed or unsigned.  In 
the FORTRAN interface, there is no distinction, so all 
integers are assumed to be signed.  The names used to 
describe the new data types are

    int8        8-bit signed integer
    uint8       8-bit unsigned integer (C only)
    int16      16-bit signed integer
    uint16     16-bit unsigned integer (C only)
    int32      32-bit signed integer
    uint32     32-bit unsigned integer (C only)
    float32    32-bit float
    float64    64-bit float


To handle the new data types, the DFSDsettype routine has
been replaced by three new routines: DFSDsetNT, DFSDgetNT and
DFSDsetorder.  DFSDsetNT and DFSDsetorder should completely
replace the DFSDsettype routine.

DFSDsetNT must be called if a number type other than float32
is to be stored.  For example:

C:
      DFSDsetNT(DFNT_INT8);
      DFSDadddata("myfile.hdf", rank, dims, i8data);

FORTRAN:
      ret  = dssnt(DFNT_INT8)
      ret = dsadata('myfile.hdf', rank, dims, i8data)

Valid parameter values for DFSDsetNT (e.g. DFNT_INT8) are
defined in the file hdf.h.  They are of the general form
"DFNT_<numbertype>", all capital letters.

Since, in addition to data, an SDS can contain max/min values 
and scales, these can also contain data types other than 
float32.  Since max/min values are supposed to relate to the 
data itself, it is assumed that the type of max/min is the 
same as the type of the data.  The same is true for scales, 
although eventually an option is planned to allow you to set 
scale number types differently than data number types.

DFSDgetNT allows you to query about the number type of the
SDS that is current.  As with other "DFSDget..." routines,
you must call DFSDgetdims, or DFSDgetdata to "move to" a
particular SDS before calling DFSDgetNT.

The second routine, DFSDsetorder, can be used to override the
default ordering of data in an SDS.  It works the same as the
previous routine DFSDsettype, with the third parameter set.
For example:

C:
      DFSDsetorder(DFO_FORTRAN);
      DFSDadddata("myfile.hdf", rank, dims, i8data);

FORTRAN:

      ret = dssodr(DFO_FORTRAN)
      ret = dspdata('myfile.hdf', rank, dims, i8data)

One result of the "setorder" implementation was the discovery
that the tag that indicates "FORTRAN order" (tag 709) may
appear as an "unknown tag" for some software that has not
been written to expect it.

(As indicated in the first part of the newsletter, the HDF 
group is considering dropping the automatic transposition 
of FORTRAN data sets with this release.)

In order to support the new number types and at the same time
make it easier to add new features (e.g data compression) to
future versions of SDS, a new structure for the SDG
(scientific data group) and some new tags have been
implemented.  The new structure is called NDG, for "numeric
data group", and has the tag 720.

In order to maintain backward compatibility, HDF examines 
each new NDG that it writes to a file to determine whether it 
is backward compatible with the old SDG structure (float32 
data only, etc.), and if it is, writes out an SDG as well as 
an NDG.  A new "link tag" (710) stored in the NDG tells HDF 
that it represents the same SDS as the corresponding SDG.


A2.  New conversion routines.

A completely new, and much more general, scheme is used for
doing conversions.  The same conversion module is now used by
the Vset and SDS interfaces, and the intention is to use this
module for all HDF I/O that requires conversions.

The conversion module is intended for use by developers only,
and should not be considered part of an application 
interface.


A3. Complete new set of general purpose routines.

The lower layer of HDF has been completely redesigned and re-
implemented, and all application interfaces, such as RIS8 and
SDS, have been re-implemented on this layer.  The new lower
layer incorporates the following improvements:

    - More consistent data and function types are needed.
    - An error handling module that supports more meaningful
      and extensive reporting of errors.
    - Simplification of key lower level functions
    - Simplified techniques for facilitating portability
    - Support for alternate forms of physical storage, such
      as linked blocks storage, and storage of the data
      portion of an object in an external file.
    - A version tag indication which version of the HDF
      library last changed an HDF file.
    - Hooks to support simultaneous access to multiple files
    - Hooks to support simultaneous access to multiple
      objects within a single file.

The modules that implement these changes can be found in
files that begin with the letter "h", and each routine begins
with the letter "H" (Hopen, Hclose, Hwrite, etc.).

Because of these changes, the names of include files are
different.  Where you included df.h previously, you
should now include hdf.h.  Also, the number and names
of basic modules has changed, and now include:

    hfile.c    basic i/0
    herr.c     error-handling
    hkit.c     general purpose routines (HDgetspace, etc.)
    hblocks.c  to support linked block physical storage
    hextelt.c  to support external storage of HDF data

More details on this new organization will be available 
later.


A4. Emulation of previous general purpose interface

Although the previous general purpose interface has been 
replaced by the new general purpose routines, backward 
compatibility is maintained by a set of routines that emulate 
the old routines.  All of the old routines that begin with 
the letters "DF" (DFopen, DFclose, DFgetelement, etc.) have 
been rewritten on top of the new "H" layers.  Users who 
currently use the "DF" routines should be able to continue to 
use them, although they are encouraged to switch to using the 
new "H" routines as soon as possible.


A5. Inclusion of the Vset calling interface

Previously, the Vset calling interface had its own library.
In HDF 3.2, the Vset calling interface is contained in the
same library as the other HDF interfaces.  As mentioned
previously, the Vset module now uses the new conversion
routines.


A6. Test modules.

In an effort to work towards having an HDF "test suite", a 
number of test modules are included with this release of HDF. 
This test suite is quite incomplete at the moment, but we 
plan to extend it to cover as many routines as possible as 
quickly as possible.

The names of the test files are formed by contatenating the 
letter "t", the base name of the interface (e.g. "dfsd"), 
optionally some other descriptive set of characters, and 
either ".c" or ".f", depending on whether they are C or 
FORTRAN programs.  For example, the program "tdfan.c" is a 
test program for the HDF object annotation interface, and 
"tdfanfile.c" is a test program for the HDF file annotation 
interface.


A7.  Naming conventions

We have tried to be more consistent in our naming conventions 
for routines with this release.  Routines in the FORTRAN and 
C applications interfaces begin with one or more capital 
letters, as follows:

        DFR8   (8-bit raster image sets)
        DF24   (24-bit raster image sets)
        DFP    (palettes)
        DFSD   (scientific data sets)
        DFAN   (annotations)
        V      (vsets)

The lower level routines are categorized as follows:

       H...   (new lower level i/o)
       DF...  (emulation of old lower level i/o routines)
       HD...  (lower level utilities for developers)
       HE...  (lower level error-handling)
       HL...  (routines that support linked block storage)
       HX...  (routines that support external elements)
       DFK... (conversion routines)
       H*I... (internal, "private" routines, not guaranteed
               always to exist or to remain the same.  Use
               at your own risk.)


B. Getting HDF 3.2 beta
-----------------------------
There are four subdirectories under HDF/HDF3.2beta: src, 
tests, examples, and tar.

    "src"       contains all source for the beta release, 
    "tests"     contains all of the test programs currently
                available for the beta release, 
    "examples"  contains the old examples, revised to 
                work with HDF 3.2, and
    "tar"       contains tar files of the test and src
                directories.  

There is an INSTALL file in the src directory with full 
particulars on installing HDF 3.2 beta.  


C. Work to do, problems to solve, and things to decide
------------------------------------------------------

The following list is in approximate order of priority and
expected chronological order.

The FORTRAN test suite needs to be expanded to cover all
interfaces, then the interfaces need to be tested.  This
means that FORTRAN modules corresponding to the following
must be written: tdfan, tdfanfile, tdfr8, tdfr24, and tdfp,
and tdfstubs.

Fortran stubs need to be written for certain H level
routines, such as HEprint, and possibly Hopen, and Hclose.

The command line utilities need to be revised.

The transpose/no-transpose policy needs to be decided and 
code changed accordingly.

Everything needs to be ported to the SGI, IBM RS/6000,
Convex, Mac, and IBM PC.

Documentation needs to be revised, including "NCSA HDF
Specifications", "NCSA HDF Calling Interfaces and Utilities",
and "HDF Vset."


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

                 Calibration Tag

Alban Deniz, an alert HDF user at Naval Research Lab, has 
suggested that we add a "calibration tag", which would be to 
provide information for calibrating the values in an SDS.  
Alban sent suggests the following specification:

  SDCAL   Scientific data offset and calibration
          16 bytes
          (731)

          This record contains four 32-bit floating point
          values:

           cal       calibration factor
           cal_err   absolute error in the calibration factor
           ioff      raw data offset
           ioff_err  rms error in the offset

The relationship between a value 'iy' stored in an SDS and 
the actual value would be defined as:

    y = cal * (iy - ioff)

The variable ioff_err contains the rms error of ioff, and 
cal_err contains the absolute error of cal.

Two new routines would be added to the SDS interface:

    DFSDgetcal(*cal, *cal_err, *ioff, *ioff_err)
    DFSDsetcal(cal, cal_err, ioff, ioff_err)

Alban has been using this tag and these routines for over a 
year with his own personalized version of the HDF library, 
and finds them very useful.  And now that HDF 3.2 provides 
support for small integers in SDS, such a tag seems very 
useful.

We would like to add Alban's routines to the SDS interface, 
but first we would like to hear your opinions about it.  Are 
there any changes you would like to see made to it to 
generalize it?