1. Introduction
The NCSA HDF group is developing a comprehensive suite of standards and tools for using XML together with HDF5. One important goal is to lay the foundation for many uses of XML. to meet this goal, we will follow and support relevant standards.1.1 How XML Might be Used with HDF5
In earlier work, we analyzed "Use Cases" for how XML might be used with HDF5.[UseCases] This analysis described a variety of different uses for XML.Some of the most important roles for XML will be as a standard format for interchanging descriptions of scientific datasets, interchanging between programs and across time (i.e., store and retrieve), in an open and heterogeneous environment.
Figure 1 shows some of the roles for XML: transformed to HTML for standard Web browsers (1), ingested into Java and other tools (2), as input to Data Location Services (Catalogs) (3), and as input for ingest, editing, and validation tools (4). We aim to support and enable as many of these uses as possible.
Figure 1. Some Roles for XML with HDF5
Another class of applications may use XML to XML translation and filtering to convert and interoperate multiple formats. For example, netCDF may be converted to HDF5 via XML using an XSL stylesheet. [XSLExperiment]
2.2. The Foundation
To realize this vision for XML, the initial tool set includes:- a standard Document Type Definition (DTD),
- Java tools to read XML and create HDF5, and
- addition of XML output to the h5dump utility.
|
|
|
|
| Document Type Definition | DTD for describing structure and contents of an HDF5 file. | Updated for HDF5.1.4, april 2001. |
| h5dump --xml | Print XML description of an HDF5 file | Available April 2001 (HDF5.1.4 patch) |
| h5gen | read XML description, create HDF5 file. | Update April, 2001. |
| h5view | Read XML, create HDF5, display and edit the HDF5.
Write XML description of open HDF5 file. |
New features available April 2001. |
2. The HDF5 XML Foundation
2.1. HDF5 Document Type Definition (DTD)
The foundation for all use of XML is to define a DTD or Schema which defines a valid description of HDF5. The HDF5 DTD is based on earlier work which specified a formal model [UML] and a grammar for describing HDF5 files [DDL]. Given these, defining the DTD was a matter of expressing the concepts in XML.At the time the DTD was implemented, XML Schema had not yet been agreed and was not clearly understood. In the future, the DTD can and should be recast as an XML Schema, and extended to specify additional aspects of the HDF5 model that could not be covered with a DTD.
The HDF5 DTD is in its second major revision, updated to be consistent with HDF5.1.4.[DTD]
Known Limitations
The DTD suffers from generic shortcomings in XML and XML DTDs, including:
- Limited expression of non-tree structures, e.g., datasets with more than one path name.
- Limited representation (as far as XML is concerned) of naming and type rules, e.g., the rules for uniqueness of path names of HDF5, and rules for the proper types of the targets of ID-REFS.
- No convenient way to mark up the data.
In addition, several aspects of the HDF5 data model are not specified (yet) in the DTD. These include:
- No specification for the format of data contents, i.e., the numbers in an array. One proposal can be found in [Folk]. See a discussion of this issue in [Binary].
- No specification for variable length data. (Variable length data types are correctly described, but there is no markup for the data itself.)
- No specification for HDF5 Region References.
- The specification of the <NativeHDF5> element is only preliminary.
- Compound data is not supported by Java programs (yet). (The data type is correctly described, but there is no way to read the data contents from XML to HDF5.) See [Compound].
- Variable length data is not supported by Java programs (yet). (The data type is correctly described, but there is no way to read the data contents from XML to HDF5.)
- User defined data types most likely can't be read.
2.2. New option to h5dump utility
The HDF5 h5dump utility prints a human readable version of the contents of an HDF5 file. The default output is HDF DDL [DDL]. A new option has been added to output the description in XML.
Essentially any HDF5 file can be dumped in XML, with the provision that there is no guarantee that all tools will be able to read the XML--even though the XML might be perfectly correct. For example, the h5dump utility will write out a dataset with compound data into a correct XML. However, the h5gen tool cannot read the data values into HDF5. (For explanation of this, see [Compound].)
2.3. Java tool, h5gen
The Java h5gen tool (revised version) reads and XML description of an HDF5 file, and generates the HDF5 file. This tool calls the HDF5 library through JHI5 JNI. [JavaHDF5][move this below]
The output of h5gen faithfully reproduces and HDF5 file from the XML, except for data that cannot be read by h5gen.
Known Limitations
The h5gen tool is available on all platforms that the Java HDF5 tools support. The h5gen cannot handle some HDF5 objects and interfaces that are not supported by the Java HDF5 Interface. Basically, these are features that are defined in C but are difficult or impossible to implement in Java. The most important cases are:
- Compound data -- can be accessed only awkwardly in Java, and the h5gen cannot parse the data in the XML at all.
- Variable length data -- cannot be handled by Java at all
- User defined data may or may not be handled correctly
There may be additional limitations in the implementation, not fully understood at this time. Possible areas of limitation may include:
- Performance with 'large' datasets -- the practical limits are not known
- Problems with HDF5 names that may violate XML name limitations
2.3. New features in the Java Editor h5view
The Java h5view visual browser/editor now has the ability to convert XML to HDF5, and to write XML. The user may select an XML file that conforms to the HDF DTD, and the corresponding HDF5 file will be generated (with the same code as the h5gen tool). The HDF5 file will be opened for browsing and editing.The h5view can also write a file as XML, which will conform to the HDF5 DTD.

The h5view tool allows the ingest of an XML description of a file (perhaps a template) to create an HDF5 file, editing of the HDF5 file to add, delete, or modify objects or their values, and generation of either HDF5 or XML.
It is important to note that the h5view tool does not edit XML. It converts from XML to HDF5 and then edits the HDF5.
Know Limitations
The h5view uses the same code as the h5gen, and thus has the same limitations. The h5gen can output XML for all HDF5 objects that it can read.
The spacing and indentation of the XML output from the h5view may not precisely match the output of the h5dump.
2.4 Interoperation of Tools
All the tools discussed here exchange data in either HDF5 or XML, and in general produce the same results.When h5gen or h5view read from XML to convert to HDF5, the XMl is validated against the HDF5 DTD, and the same HDF5 file will be created for a given XML input file. The h5dump and h5view will write the same XML, given the same HDF5 input file.

In fact, the transformation:
will usually produce a file2.h5 that is identical to file1.h5.file1.h5 -> h5dump --xml -> file1.h5.xml -> h5gen -> file2.h5
Known Limitations
Some objects cannot be converted to XML at all (e.g., region references), and some object cannot be read into Java (e.g., compound data). In these cases, the output will be incomplete but correct.
In some cases, the output files may be logically identical (i.e., they have the same elements, attributes, values, etc.), but have slightly different binary representations on disk. This may happen, for instance, when the same objects are written in a different order.
3. Futures
There are many possible future activities with XML. These may include:- Definition of a standard XML Schema for HDF5
- update tools
- Experiments with <xlink> references to data in HDF5 datasets.
- Experiments with XSL style sheets, a la [XSLExperiment]
- XML support for HDF4
References
[UML] "HDF5 Abstract Data Model", http://www.hdfgroup.uiuc.edu/papers/presentations/ADM/ADM_990506/index.html[DDL] "DDL in BNF for HDF5", /HDF5/doc/ddl.html
[DTD] //DTDs/HDF5-File.dtd
[UseCases] "Some Suggested Use Cases for XML
with HDF-5",
/HDF5/XML/UseCases/use-cases-1.html
[DesignNotes] "The XML DTD for HDF5:
Design Notes",
/HDF5/XML/design-notes.html
[XSLExperiment] Robert E. McGrath, "Experiment with XSL: translating scientific data", /HDF5/XML/nctoh5/writeup.htm
[Binary] "Representing "Binary" Data in XML" /HDF5/XML/tools/binary.html
[Compound] "HDF5 Compound Data: Technical Issues for XML, Java, and Tools" /HDF5/XML/tools/compound-data.html
[JavaHDF5] "THG HDF Java Products" /hdf-java-html
[Folk] Mike Folk, "Proposal for representing
simple data in the HDF5 XML DTD",
/HDF5/XML/design-notes.html
- - Last modified:August 15th 2007

