- How do I create a String attribute or dataset? (See DATATYPES)
- When would you use attributes vs. datasets for your metadata ?
- When would you use compact attributes vs. compact datasets ?
C++
- Does C++ support stream operators?
- Does it do IO via standard library containers?
- Example of creating a string attribute
DATASETS
- How do I find out what compression has been used to write a dataset?
- Can you delete objects in an HDF5 file ? If yes, how ?
- When would you use compact attributes vs. compact datasets ? (See ATTRIBUTES)
DATASPACE
DATATYPES
- How do I create a String attribute or dataset?
- How do you create datasets with a time datatype (class of H5T_TIME)?
- When do you use pre-defined standard vs. native datatypes?
- Further Understanding Native Datatypes
-
Compound Datatypes:
What are the plusses and minusses of using compound datatypes ?
How would you use a float pointer in a compound datatype ?
Working with compound datatypes: Can HOFFSET return different answers on different machines ?
When reading compound data from file w/different compound field name, the data is different for this field.
Is there a limit on the number of fields allowed in a compound datatype ? -
Enum Datatypes:
How Does the HDF5 library handle overflowing (invalid) ENUM data ?
H5Tenum_insert returns error when type has different endianness - Does HDF5 support a boolean datatype ?
- Does HDF5 support bitfields ?
- What is the proper way to allocate memory when reading a variable length datatype ?
- Why was the H5T_ARRAY datatype created ? What is it for ?
- How can you tell if two datasets share their datatype ?
EXTERNAL LIBRARIES
ERROR
- The error report prints out full path for file name instead of file name only
- How do you turn off error messages ?
FILE
FORTRAN
- How do you create a dataset with a 16-bit datatype in F90 ?
- How do you build HDF5 using gcc and either Intel or Lahey Fortran ?
- How to build HDF5 Fortran with Intel Compilers
- Why is there no H5Tget_native_type function for Fortran ? What should you do to get the memory datatype for a compound datatype in F90 ?
- Can you build shared Fortran libraries in HDF5 ?
- Building Fortran application with Intel Fortran, get "/lib/libimf.so : warning feupdateenv is not implemented"
GENERAL
- Can you store a binary object ( Excel spreadsheet, Word document, Zip file, etc.. ) in an HDF5 file ?
-
Thread-safety:
Is HDF5 multi-threaded?
Is HDF5 thread-safe?
Can you run parallel HDF5 and the thread-safe feature together ? What about Parallel HDF5 and C++? -
Concurrent Access:
Does HDF5 support concurrent access to one or more HDF5 file(s) from multiple threads in a single process?
Does HDF5 support concurrent access to a single dataset from multiple processes?
Can you read an HDF5 file while it is being written to? -
File Sizes:
Why are my files sizes different, if I open an HDF5 file more than once rather than writing the data out in one call?
If you run an application twice on the same machine will it produce identical HDF5 files ? -
Version Information:
Given an HDF5 library, how can I easily determine which version of HDF5 is being used (linked)?
Given a library that calls HDF5 functions, how can I determine which version of HDF5 was used to build it ? - I'd like to access an HDF5 file without using the HDF5 library. Is this possible?
- Will data manipulation routines be added to HDF5?
- Can you guarantee that a zero will not be used as a valid hid_t ?
- Does HDF5 support meshes ?
- How do you create files over 2GB ?
IDENTIFIERS
- How do you get the name of an object opened with H5Rdereference?
- How do you get the path and name of a committed datatype?
IMAGES AND PALETTES
- How do you store a true color image in HDF5?
- What kind of palettes are supported?
- If you have an X by Y image in Z bands how would you store that in HDF5 ?
- How do you save an image made from (r,g,b) floating point values?
INSTALLING/BUILDING HDF5
(For Fortran issues, see FORTRAN)
- Information on building and using HDF/HDF5 on Windows
- Having problems building an application with pre-built libraries.
- Is the HDF5 C source C99 compliant? Is it C89 compatible ?
- How can you determine what compiler/flags are used by an HDF5 installation?
- The mtime test fails with the message 'Old modification time incorrect.', when building with an unsupported platform/compiler
- Can't open shared library: ../lib...s#.0
- Building on MAC OSX, the symbols restFP and saveFP come up as undefined. Why?
- Float to Conversion Tests Fail on AMD with Intel compiler. Why?
- Using Purify on HDF5 library, get uninitialized memory read error
- When Building HDF5, the Object Header test, "Testing message deletion", fails.
AIX:
AIX Configure failure: "...config.sub: too many arguments"
AIX: HDF5 fails to build with gcc
Building on AIX 64-bit, get: ERROR: No csects or exported symbols have been saved.
Problems installing Parallel HDF5 on IBM Regatta (with AIX5) (see PARALLEL HDF5)
JAVA
-
Problems Finding or Using File Format or Library:
HDFView: Unsupported fileformat error opening an HDF/HDF5 file on Mac OS X
HDFView: Unsupported fileformat error opening an HDF/HDF5 file on Windows XP
HDFView: HDF4 / HDF5 libraries are grayed out and unselectable
Java error: can't find HDF5 file format
java.lang.UnsatisfiedLinkError: no jhdf5 in java.library.path -
Features that Are or Are Not Supported:
Do you support Java 64-bit with HDF-JAVA?
What are the limitations of HDF Java interface ?
What kind of palettes are supported in HDFView?
Will you be adding a pure Java interface to HDF5?
Is parallel HDF5 supported with java?
Can HDFView handle bit-field data? -
Issues with Reversed / Swapped / Transposed Values:
Data Values are Reversed / Swapped / Transposed in HDFView. Why?
The Order Returned by the Object Package for the selectedDims / selectedIndex Fields has Changed -
Memory Issues:
How do you increase the Java Virtual machine memory?
Get EXCEPTION_ACCESS_VIOLATION when more than 1024MB is allocated to the Java Virtual machine.
How to workaround error Exception - dataset too big ?
PARALLEL HDF5
-
Using PHDF5 (How To Questions):
How can Parallel HDF5 APIs be called (collectively, independently)?
How should you write attributes in Parallel HDF5?
How to write and NOT to write compound datasets using F90 in Parallel HDF5
How do you write data when one process doesn't have or need to write data ?
How do you configure HDF5 to create separate files or each compute node in a cluster ?
How do you set up HDF5 so only one MPI rank 0 process does I/O ?
How can I read/write a dataset greater than 2GB? -
What Does PHDF5 Support/Require?:
What do you need to run Parallel HDF5?
Does HDF5 support compression with parallel HDF5 ? If not, why ?
Does Parallel HDF5 support chunking ?
Does Parallel HDF5 support variable length datatypes ?
Does Parallel HDF5 support shared libraries? -
Performance Issues:
What performance can you expect from Parallel HDF5?
Performance: Parallel I/O with Chunking Storage -
Build Issues:
Can you run parallel HDF5 and the thread-safe feature together ? What about Parallel HDF5 and C++?
Problems installing Parallel HDF5 on IBM Regatta (with AIX5)
MPI ... failed: array services not available
How do you build HDF5 on BlueGene/L?
PERFORMANCE
- Things That Can Affect Performance
- How to improve the performance of H5Gget_info_by_idx and H5Lget_info_by_idx in HDF5 1.8
- Linux Memory Handling and Performance
- Information on the Metadata Cache
- Parallel I/O with Chunking Storage
- Are there performance metrics for working with HDF5 files ?
- Problem with Valgrind (Purify) and HDF5
- Performance-wise, how does HDF5 compare to a relational database?
PROPERTIES
- What function do you use to get compression level information?
- How do you work with a file created with the file family feature?
- Can you work with an HDF5 file in memory ?
- When writing chunks and using Fletcher checksums are there any situations where the HDF5 API will do a read of a "chunk" under the covers when an application is writing a file ?
UTILITIES
- How do you use the h5cc (h5fc) utility ?
- Why is h5dump slower than h5ls?
- Can you add an option to h5dump or h5ls to print the version of a file ?
ATTRIBUTES
When would you use attributes vs. datasets for your metadata ?
-
Attributes are metadata objects intended for describing the nature and/or
usage of a primary data object, such as a dataset, group, or named datatype.
They are similar to datasets, but there are major differences between the two:
-
An attribute is stored in the header of an object (compact storage) if it is small enough to fit in the header. If it is too large to fit in the header, the storage changes to dense storage, in which the attribute is stored in a separate heap indexed with a B-tree.
-
Datasets are extendible, can be compressed, and you can do partial I/O operations on them (subsetting). Attributes are not extendible, cannot be compressed, and do not support partial I/O operations.
When would you use compact attributes vs. compact datasets ?
A compact attribute is stored in the header of an object, much like a compact dataset. However, the size of a compact attribute is limited to 64KB. The compact dataset size limit is 64KB, but the recommended size is 30KB or less.
An attribute cannot be shared between several objects. If you store the data in a compact dataset, other objects can use attributes with the object reference datatype to point to the compact dataset. This mechanism allows "sharing" the data stored in the compact dataset between the objects.
I/0 speed should be the same for both attributes and compact datasets since it will be a memory operation. Please remember that you will pay a price while opening/closing the file if you store information in the objects' headers. An application will benefit from the attribute or compact storage mechanism only if it accesses and updates the object state many times during the life of an application. If it is done just a few times, the benefit is questionable.
C++
Does C++ support stream operators?
-
No, not yet. We intend to add support for them, but since they are
a convenience feature, this has not been high in priority.
Does it do IO via standard library containers?
-
No, we have not looked at this yet, but suspect it will be
complicated to implement.
DATASETS
How do I find out what compression has been used to write a dataset?
Within an application, you can open the dataset with H5Dopen(), query the dataset's creation property list with H5Dget_create_plist() and then get the number of filters defined for the dataset with H5Pget_nfilters(). Then you loop from 0 to n-1 and calling H5Pget_filter() to retrieve info about each filter.
Public filters are identified by a unique integer ID listed in H5Zpublic.h (currently only H5Z_FILTER_DEFLATE). See doc/html/Filters.html for more info (the transient filters it mentions have never been implemented).
You don't need to know what filters were used to write a particular dataset -- you only have to make sure that they have been registered with H5Zregister() before reading.
With the h5dump utility you can specify the -p to list the properties used in the file.
DATASPACE
When you specify memspace and filespace for H5Dwrite and H5Dread does it mean it is allocating memory for both the dataset and memory space (ie. twice the size of dataset) ?
No, memspace is just a description of the buffer in memory (i.e. where read elements will go). If there is no data conversion, then we read directly into the user supplied buffer. If there is data conversion, we use a 1MB buffer to do the conversions, but we still use the user's buffer for reading data in the first place.
Also, you can adjust the 1MB default conversion buffer size. (see H5Pset_buffer)
DATATYPES
Why was the H5T_ARRAY created ? What is it for ?
-
The array datatype was created to address the simple case of a compound
datatype when all members of the compound datatype are of the same
type and there is no need to subset by compound datatype members.
Creation of such a datatype is more efficient and I/O also requires less
work, because there is no alignment involved.
Previously, you had to create a compound datatype if you wanted to use an array-like datatype for creating a dataset. This was fine if you really wanted to use the array as a field in a compound datatype, but there were developers who wanted to just have a "plain" array (without the compound datatype wrapping) as a datatype for their dataset. We decided to obsolete the array fields in compound datatypes and promote arrays to a "first-class" datatype. This allowed applications to create and use them without involving a compound datatype. Along with being more "obvious" about the intentions of the datatype, array datatypes are also somewhat more efficient in certain circumstances, as mentioned above.
How can you tell if two datasets share their (named, committed) datatype?
-
You can use the H5Gget_objinfo function to retrieve the "stat" information
for each named datatype. Then, comparing the fileno and objno fields in the
H5G_stat_t struct for each type should tell you if the two named datatypes
refer to the same object in the file.
How do I create a String attribute or dataset?
-
HDF5 has a string datatype (H5T_C_S1). To create a string longer than one
character, you must get a copy or instance of the datatype and then modify it. For example,
in C:
strtype = H5Tcopy (H5T_C_S1); /* Make a copy of H5T_C_S1 */ size = 10; status = H5Tset_size (strtype, size); /* Modify the string to be of length 'size' */ /* Use the strtype in H5Acreate or H5Dcreate */
Example programs to create a string attribute/dataset can be found at:
http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/api18-c.html
Also, see Derived Datatypes in the HDF5 Tutorial.
How do you create datasets with a time datatype (class of H5T_TIME)?
-
You would use the datatypes H5T_UNIX_32BE (LE) or H5T_UNIX_64BE (LE).
HDF5 doesn't try to interpret or do anything special with the data.
This would currently be up to the user's code. For a C example, see:
h5time.c
(Note that h5dump does not support this type.)
When do you use pre-defined standard vs. native datatypes?
Create your dataset with the pre-defined standard datatypes, and read from a dataset with the native datatypes (H5T_NATIVE*). Basically all memory datatypes should be native datatypes, and the datatype for a read is a memory datatype. See the Datatypes Table for combinations of memory and pre-defined datatypes to use.
A general purpose tool for reading HDF5 datasets can obtain the native datatype by calling H5Tget_native_type for a specified datatype.
Further Understanding Native Datatypes
-
Question:
I want to make sure I understand the use of native types vs. non-native types.
When reading a dataset or an attribute, I always want to read into a native type identifier, whether I am reading integer, floating point, user defined types or whatever. It is always safe for me to do a H5Tget_native_type call and then call H5Dread (or H5Aread) with the native data type.
In the case of integer types, for example, I could use the defined native integer type without calling get_native_type, but it is easier to just call it for every dataset and attribute, rather then put in the logic to decide whether a pre-defined native type constant exists.
And every datatype I get from H5Tget_native_type must be closed with H5Tclose.
Are my assumptions correct?
Answer:
Yes, that is correct. There might be special circumstances where you want
to leave the datatype alone however. For example, we leave the datatype
alone when repacking a file, even though it involves reading the dataset
elements into memory.
Question:
I have been creating (and closing) native types for every
read/write operation. Now I am moving to getting a native type for each
variable when it is created/read, and leaving that native type open
until the file is closed. Is that okay?
That would work fine and probably have less function call overhead, although a slightly bigger memory footprint.
Question:
Any further comments? What is the overhead of a native type?
This approach is fine, unless you have some other special circumstances (like we do when repacking). Native datatypes don't have any extra overhead but, like all datatypes, their size in memory depends on the complexity of the type they are describing (i.e. a datatype description of a compound datatype is larger than the description of a double-precision float).
Does HDF5 support a boolean datatype ?
-
No. HDF5 is written in C, which does not have a boolean
datatype. Use an integer type and interpret your data according
to your rules.
What are the plusses and minusses of using compound datatypes ?
-
If you are using C, then compound datatypes will be fast to use.
Compound datatypes work well with C, as they are patterned after
it. If using Fortran or Java, then using them will be slow, and you
have to read/write data by field, so it is also cumbersome. [It is not
possible to pass an array of Fortran structures to a C function in
a portable manner. In any case, the Fortran layer has to repack the
Fortran array to an array of C structures. The main problem is that Fortran
enforces type checking at compilation time and it is impossible to
overload the
h5dread/write_f function with a datatype that is
defined by the user.]
There are some issues you may run into. First of all, applications that support HDF5 may not fully support compound datatypes. We only support them minimally in our HDF Java Viewer, because java doesn't handle them well.
There have been problems with including variable length datatypes in a compound datatype. Performance is slow and a couple of users have also encountered some other problems. We plan to address any problems, though.
Does HDF5 support bitfields ?
Yes. Examples of creating a dataset and attribute with a bitfield datatype can be found on the HDF5 1.8 Examples by API page. Under Datatypes, see the examples, h5ex_t_bit.c and h5ex_t_bitatt.c.
The n-bit filter can be used to pack the data in the file. See the H5Pset_nbit Property List API for more information on this.
Also see Section 6, Using Filters, in the HDF5 User's Guide Datasets chapter.
How would you use a float pointer in a compound datatype ?
-
You cannot create a compound datatype using a typedef that looks like
this, where z is allocated dynamically:
typedef struct xxx_t {
int x;
short y;
float * z;
} xxx_t
However, you can use the hvl_t struct to do this.
There is no [easy] way to determine the end of an array of floating-point numbers (or other non-character string sequences), so the hvl_t struct must be used to provide the length of the sequence.
How Does the HDF5 library handle overflowing (invalid) ENUM data ?
For ENUM data, the library allows overflowing (invalid) data to be written to the file. For example, the HDF5 ENUM data type is equivalent to the C enum type below:
typedef enum {
E1_RED = 0,
E1_GREEN = 1,
E1_BLUE = 2,
E1_WHITE = 3,
E1_BLACK = 4
} c_e1;
The library actually allows the values beyond the range of 0 to 4 to be written the data set of this ENUM data type. In the
past (until the 1.8.7 release), the library returned the original values including the overflowing values when there was no
data conversion during the data reading. But if there was any data conversion involved, such as reading the data created as
little-endian on a big-endian machine, the library's default handling of overflowing values was to assign -1 to them.
Starting from release 1.8.8, the library's default handling of overflowing values retains the original values whether
there is data conversion or not. If a user still wants to have the past behavior (assigning -1), the library provides an
API function H5Pset_enum_conv_overflow to control it.
H5Tenum_insert returns error when type has different endianness
-
This happens because H5Eenum_insert uses a void pointer to pass in the enum
member value. It doesn't know whether a "char" or "int" is coming in. If the
base type of the enum is H5T_STD_I8LE, but an big-endian "int" is passed in,
H5Eenum_insert simply copies the first byte of the big-endian "int" as the
value of this member. That will be the high-digit byte, instead of the low
digit byte which contains the actual value.
The best way to avoid the problem is to use the native type for the base type in calling H5Tenum_insert. In this example, it will be "char":
int status = H5.H5Tenum_insert(booleanEnum, "true", new char[] {1});
When reading compound data from file w/different compound field name, the data is different for this field.
-
The library is designed to check the fields of a compound type by name.
If the name doesn't match, the library will leave the data for this field
alone, in case the user has some background data in memory.
The right way to read data into memory is to call H5Dget_type and H5Tget_native_type to figure out the data type in memory. (Another way of course, is simply to avoid using wrong names. :)
What is the proper way to allocate memory when reading a variable length datatype?
In the case of a VL type, the HDF5 library allocates a buffer and the user's application has to free it. There is no special call for a character string, so just use a C free. For more complex VL types, use H5Dvlen_reclaim. See: h5ex_t_string.c
Is there a limit on the number of fields allowed in a compound datatype ?
If you are using HDF5 1.8.0 or previous releases, there is a limit on the number of fields you can have in a compound datatype. This is due to the 64K limit on object header messages, into which datatypes are encoded. (However, you can create a lot of fields before it will fail. One user was able to create up to 1260 fields in a compound datatype before it failed.)
EXTERNAL LIBRARIES
How to detect the SZIP encoder at run time
-
On Unix platforms, a quick way is to use
strings and
grep on the SZIP library, as follows:
strings libsz.a |grep ENCODEThis will return "SZIP ENCODER ENABLED" if the encoder is enabled in the SZIP library.
Another way is to write an application that checks whether the SZIP
library included with HDF5 is encoder-enabled or not. Use the
H5Zget_filter_info function, as follows:
#include "hdf5.h"
int main(void)
{
herr_t status;
unsigned int filter_config_flags;
status =H5Zget_filter_info(H5Z_FILTER_SZIP, &filter_config_flags);
if ((filter_config_flags & H5Z_FILTER_CONFIG_ENCODE_ENABLED) == 0)
printf("SZIP encoding is disabled.\n");
else printf ("SZIP encoding is enabled.\n");
}
ERROR
The error report prints out full path for file name instead of file name only
-
What can I change in the configuration/headers to make it
not print the full path of the source file in error messages?
This behavior is seen on several platforms.
The HDF5 library simply prints out the C macro __FILE__ as the file name. Each compiler has its own interpretation: Some compilers print the file name only; some print the full path name; others, just the relevant path name.
The HDF5 library has no control over this.
How do you turn off error messages ?
-
Use the
H5Eset_auto call to toggle error printing on
and off.
FILE
How Closing a File Affects Open Objects
-
One is when the
H5closefunction is used.H5closecauses a general shutdown of the library: all data is written to disk, all identifiers are closed, and all memory used by the library is cleaned up. -
Another exception occurs on parallel processing systems. Suppose on a parallel system an application has opened a file, a group in the file, and two datasets in the group. If the application uses the
H5Fclosefunction to close the file, the call will fail with an error. The open group and datasets must be closed before the file can be closed. -
A third exception is when the file access property list includes the property H5F_CLOSE_STRONG. This property is specified when opening the file, and it closes all open objects when the file is closed with
H5Fclose. For more information, see the H5Pset_fclose_degree function in the HDF5 Reference Manual.
An object (dataset, group, attribute, named datatype) in an HDF5 file can be opened, and it can be opened more than once. When an object is opened, the HDF5 library returns a unique identifier to the application. Every object that is opened must be closed. If an object was opened more than once, each identifier that was returned to the application must be closed. For example, if a dataset was opened twice, both dataset identifiers must be released (closed) before the dataset can be considered closed. Suppose an application has opened a file, a group in the file, and two datasets in the group. In order for the file to be totally closed, the file, group, and datasets must each be closed. Closing the file before the group or the datasets will not effect the state of the group or datasets: the group and datasets will still be open.
There are several exceptions to the above general rule:
How can I resolve problems due to objects being left open?
-
If an object (group, dataset, etc.) in a file is not closed, then the
file does not get closed, which can cause problems.
- We have a file access property list that you can use to do a 'strong'
close of a file. See the function:
H5Pset_fclose_degree.
See the example program h5close.c. - You can call
H5close, and that will automatically close everything. - You can call
H5Fget_obj_countto get the number of open object identifiers for an open file.
You can callH5Fget_obj_idsto get the list of open object identifiers.
See the example program: h5ckopen.c
There are some things you can do. You can either close everything automatically, or get the number of open objects and then close them:
(Note that these routines do not work with dataspace objects.)
FORTRAN
How do you create a dataset with a 16-bit datatype in F90 ?
-
What you have to do is use the Fortran INTEGER type in memory and use
h5tset_size_f on H5T_INTEGER_NATIVE (or another INTEGER type)
to set the size to 2 bytes. The library will store 16-bit integers instead
of 32-bits.
Currently the F90 APIs do not support INTEGER*2 in memory.
See the example, h516bit.f90.
How do you build HDF5 using gcc and either Intel or Lahey Fortran ?
-
The Intel and Lahey Fortran linkers cannot find the proper GNU gcc library,
causing the build to fail in the fortran/test and fortran/examples
directories with the error
unresolved __fixunsdfdi symbol.
Use the,
setenv LIBS "-lgcc_s" (if using dynamic linking)
or
setenv LIBS "-lgcc" (if using static linking. For example,
if using ifort with the "-static" option)
command before running configure, or modify the LIBS
argument in the fortran/test/Makefile and
fortran/examples/Makefile files.
Then continue the build in the fortran directory.
If you use h5cc or h5fc, you will also need to
edit them and add "-lgcc_s" or "-lgcc" to them.
Can you build shared Fortran libraries in HDF5 ?
Shared Fortran libraries are not supported in the HDF5 1.6 branch, but they are supported for some platforms in the HDF5 1.8 branch. Refer to the Supported Configuration Features Summary in the 1.8.0 release notes for more details.
Building Fortran application with Intel Fortran, get "/lib/libimf.so : warning feupdateenv is not implemented"
Add -i_dynamic to FCFLAGS.
GENERAL
Can you delete objects in an HDF5 file ? If yes, how ?
-
Yes, you can use the H5Ldelete function to delete objects in an
HDF5 file (for HDF5 1.6, use H5Gunlink). Currently, however, the
space where the object was located in the file does not get re-used.
Therefore the size of the file will remain the same. You can get
rid of this unused space in a file by writing the contents of the
HDF5 file to a new file. This can be done with the
HDFView tool, as well as the h5repack
utility, included with the HDF5 software distribution.
In a future release of HDF5 we will include support for managing the free space in a file.
Does HDF5 support meshes ?
Meshes can be stored in HDF5, although there is not a standard API to do this. There are several formats that support meshes and use HDF5 as the underlying storage (for example, CGNS, MOAB, Silo, XDMF, ..). See the Table of Software Using HDF5 on the HDF5 Tools page for examples.
A prototype HDF5 Mesh API was created as an attempt to provide a standard higher-level API for storing and retrieving structured and unstructured 'mesh' data, typical of applications such as computational fluid dynamics, finite element analysis, and visualization. This was never added to HDF5, and there are no plans to do so.
How do you create files over 2GB ?
-
If a filesystem ordinarily handles files over two gigabytes, then
HDF5 will be able to create files larger than two gigabytes.
If a filesystem does not handle files greater than two gigabytes,
there are still ways to create files greater than two gigabytes with HDF5.
You can use the file access property list to set up a file family driver. Your HDF5 file will be split into a "family" of files of the same size.
Another way is to use the external file feature. This is controlled by the Dataset Creation Property list. You could store the datasets in an HDF5 file in separate external files of less than 2GB.
For examples of using File Access and Dataset Creation property lists, see the Property tutorial topic in the Advanced section of the HDF5 Tutorial.
With Windows -32 bit, HDF5 can handle files greater than 2GB by use of the native datatype. For example, using H5T_NATIVE_LLONG instead of H5T_NATIVE_LONG.
Does HDF5 support concurrent access to one or more HDF5 file(s) from multiple threads in a single process?
Concurrent access to one or more HDF5 file(s) from multiple threads in the same process is supported with a thread-safe build of HDF5.
Concurrent access to one or more HDF5 file(s) from multiple threads in the same process will not work with a non-thread-safe build of the HDF5 library. The pre-built binaries that are available for download are not thread-safe.
Users are often surprised to learn that (1) concurrent access to different datasets in a single HDF5 file and (2) concurrent access to different HDF5 files both require a thread-safe version of the HDF5 library. Although each thread in these examples is accessing different data, the HDF5 library modifies global data structures that are independent of a particular HDF5 dataset or HDF5 file. HDF5 relies on a semaphore around the library API calls in the thread-safe version of the library to protect the data structure from corruption by simultaneous manipulation from different threads. Examples of HDF5 library global data structures that must be protected are the freespace manager and open file lists.
Does HDF5 support concurrent access to a single dataset from multiple processes?
If all processes are reading, then, yes, HDF5 (serial) does support this. If there are any processes that are writing, then no, this is not supported. We are working on a "Single Write Multiple Read" (SWMR) feature, which will be available in a future release (expected to be in HDF5-1.10).
Can you read an HDF5 file while it is being written to?
-
It is possible for multiple processes to read an HDF5 file when
it is being written to, and still read correct data.
(The following steps should be followed, EVEN IF the dataset
that is being
written to is different than the datasets that are read.)
- Call H5Fflush() from the writing process.
- The writing process _must_ wait until either a copy of the file
is made for the reading process, or the reading process is done
accessing the file (so that more data isn't written to the
file, giving the reader an inconsistent view of the file's state).
- The reading process _must_ open the file (it cannot have the
file open before the writing process flushes its information, or
it runs the risk of having its data cached in memory being incorrect
with respect to the state of the file) and read whatever information
it wants.
- The reading process must close the file.
- The writing process may now proceed to write more data to the file.
Here's what needs to be done:
There must also be some mechanism for the writing process to signal the reading process that the file is ready for reading and some way for the reading process to signal the writing process that the file may be written to again.
Is HDF5 multi-threaded?
No, HDF5 is not multi-threaded.
Is HDF5 thread-safe?
The HDF5 library can be built in thread-safe mode. The thread-safe version of the HDF5 library effectively serializes the HDF5 library calls. It is thread-safe but not thread-efficient. The HDF Group has a design plan for a more efficient implementation of thread-safety, but currently does not have the resources to implement the plan. If you are interested in supporting this effort, please contact the HDF Helpdesk at:
The thread-safe version of the HDF5 library uses POSIX threads (Pthreads) on Unix (and Mac). To build a thread-safe version of the library, specify the --enable-threadsafe and --with-pthread=DIR flags when configuring:
./configure --enable-threadsafe --with-pthread=DIR
As of HDF5-1.8.6, support for thread-safety on Windows using the Windows threads library has been added. The HDF5_ENABLE_THREADSAFE option can be used in CMake on a Windows platform to enable this functionality. This is supported on Windows Vista and newer Windows operating systems.
For further information on Thread-safe HDF5, see the Thread-safe page and the documents referenced on that page.
Performance-wise, how does HDF5 compare to a relational database?
-
It really depends on your application. HDF5 is tuned to do efficient
I/O and storage for "big" data (hundreds of megabytes and more). It will
not work well for small reads/writes.
It doesn't have indexing capabilities, though we are working on some limited features. See the HDF5_Prototype_Indexing_Requirements for details.
HDF5 was designed to complement DBs and not to compete with them.
Why are my files sizes different, if I open an HDF5 file more than once rather than writing the data out in one call?
-
The size discrepencies can be related to the way small metadata and raw data
gets allocated in the file.
Currently, all metadata below a certain threshold size (2KB by default) will cause the library to allocate a block of that threshold size (i.e. 2KB) to store the metadata in, anticipating that more metadata will be added to the file soon and could be sub-allocated from that block. A program which doesn't add more metadata to the block will cause the rest of that block to be wasted in the file because the library doesn't currently remember the free space in the file from one file open to the next.
The threshold block size in the library can be changed with a call to H5Pset_meta_block_size (and H5Pset_small_data_block_size, in libraries which have it - should be in the 1.4.4 release) like so:
fapl_id=H5Pcreate(H5P_FILE_ACCESS);
printf ("H5Pcreate returns: %i\n", fapl_id);
status = H5Pset_meta_block_size (fapl_id,0);
printf ("H5Pset_meta_block_size returns: %i\n", status);
#ifdef WHEN_ITS_AVAILABLE
status = H5Pset_small_data_block_size (fapl_id,0);
printf ("H5Pset_small_data_block_size returns: %i\n", status);
#endif /* WHEN_ITS_AVAILABLE */
Setting the block size to zero should really only be used when small
amounts of metadata are being added each time the file is opened. Setting the
block size to zero will intermix the raw data blocks allocated in the file with
the metadata information in the file and cause the overall number of I/O
operations on the file to increase (reducing performance), because the library
cannot cache as much metadata in memory.
Performance-wise, it would be better to hold the file open as long as possible and not to adjust the block size, but users will have to decide whether file size or I/O performance is their overall goal.
I'd like to access an HDF5 file without using the HDF5 library. Is this possible?
-
Although it is possible to parse through an HDF5 file using just the file
format documentation as a guide, it is strongly recommended that you use
the HDF5 library to access HDF5 files instead. The algorithms and data
structures stored in an HDF5 file can be complex and difficult to
understand well enough to parse correctly. Additionally, there are certain
requirements on the structure of the data structures (the B-trees, for
example) that may not be obvious from a static representation of them in
the file and may not be fulfilled by indiscriminate operations on them.
There are also some third-party data structures stored in the file that
are not documented in the HDF5 file format documentation, such as
the format of compressed data using the deflate algorithm.
Will data manipulation routines be added to HDF5?
-
No, this is left up to the user. There are many packages available,
including BLAS and LINPACK for this.
Given an HDF5 library, how can I determine which version of HDF5 is being used (linked)?
- Use one of the following to get the H5_VERS_INFO string as defined in
H5public.h.:
% strings libhdf5.a | grep "HDF5 library version:" % strings a.out | grep "HDF5 library version:"This method works even if the a.out file is "stripped", and even if the binary file is not produced by the host machine.
Given a library that calls HDF5 functions, how can I determine which version of HDF5 was used to build it?
-
Add the following line to the calling library source, e.g., xyz.c.
/* C automatically merges two adjacent strings into one. */ /* Use non static char string so that it is included always. */ char XYZ_built_with_H5_lib_vers_info_g[] = "XYZ built with " H5_VERS_INFO;Rebuild the library and then the following commands will show the information:
% strings libxyz.a | grep "HDF5 library version:" XYZ built with HDF5 library version: 1.4.4
Performance-wise, how does HDF5 compare to a relational database?
-
It really depends on your application. HDF5 is tuned to do efficient
I/O and storage for "big" data (hundreds of megabytes and more). It will
not work well for small reads/writes.
It doesn't have indexing capabilities, though we are working on some limited features. See the HDF5_Prototype_Indexing_Requirements for details.
HDF5 was designed to complement DBs and not to compete with them.
If you run an application twice on the same machine will it produce identical HDF5 files ?
-
Will netCDF4 make bit-for-bit (BFB) reproducible files?
In other words will running a deterministic model twice on the same
machine (without changing compiler, netCDF library, etc.) produce
identical output files?
I thought that netCDF4 would, like netCDF3, use deterministic algorithms without any date-stamps within the file, and thus produce BFB files. Apparently I am wrong.
To determine whether files were BFB I checked their SHA1 sums. netCDF3 produces BFB files and netCDF4 does not. BFB output files make models easier to debug, so it would be helpful if netCDF4 continued the netCDF3 BFB "tradition". Is this possible? Am I missing something? Is HDF5 the culprit?
Answer: If you turn off the create/modify/access time tracking for objects created (with the H5Pset_obj_track_times() routine), everything should be bit-for-bit reproducible. Coincidentally, it makes accessing those objects faster and the size of their metadata smaller also. You do lose the ability to know when the object was created/modified/accessed.
Can you guarantee that a zero will not be used as a valid hid_t ?
In the H5Fcreate documentation it says:
Returns:
Returns a file identifier if successful; otherwise returns a negative value.
But what about zero? According to this definition, zero would be a valid
file identifier, and might be returned by H5Fcreate. Is
this intentional? There are conveniences associated with zero in C, but
these cannot be used in the case of HDF5 identifiers, if zero is a valid value.
Answer: Yes, HDF5 will not return 0 for an hid_t from a function. However, a zero is used for the "default" value for hid_t's (H5P_DEFAULT is 0, for instance), so it is a "valid" value for an hid_t. Although HDF5 does not return the "default" value from any routines currently (only accepting it as an input parameter) and probably never will, it is possible to receive it from the user application.
Can you store a binary object ( Excel spreadsheet, Word document, Zip file, etc.. ) in an HDF5 file ?
Yes. You can store your file in a dataset with an opaque datatype, in which the opaque datatype's tag is set to the MIME type of the file, with a prefix of "Content-Type":
Content-Type: <mime-type>
Storing the file as a dataset with an opaque datatype will prevent datatype conversions from occurring when reading or writing. Putting the mime-type in the tag, with the "Content-Type:" prefix, gives users a standard place to look for the mime-type (and mimics the prefix that email headers use for storing the mime-type).
Another way to do this is to store the file in the user block. The user block is a user definable data block at the beginning of an HDF5 file, which HDF5 ignores. There are utilities (h5jam, h5unjam) for writing to and extracting information from the user block. Also see the H5Pset_userblock API for creating a user block from within an application.
IDENTIFIERS
How do you get the name of an object opened with H5Rdereference?
-
You can't. There is no way to get the name of an object opened with
H5Rdereference. The H5Iget_name function
cannot be used to get the name
from an object opened with H5Rdereference.
However, if you need to compare two object identifiers, there is a
workaround. You can determine if two object identifiers point to the
same object, by using H5Gget_objinfo:
hid_t obj1, obj2; H5G_stat_t stat1, stat2; H5Gget_objinfo(obj1, ".", flag, &stat1); H5Gget_objinfo(obj2, ".", flag, &stat2);and then compare
stat1 and stat2.
Although there are issues with creating a function that will return the name of a dereferenced object, we are planning on adding this function to HDF5. For more information, see:
derefobjname.txt
How do you get the path and name of a committed datatype?
You can call H5Iget_name to get the path and name of a
committed datatype.
You may first want to check that the datatype is committed. You
can do that with the H5Tcommitted API.
IMAGES AND PALETTES
How do you store a true color image in HDF5 ?
- There are two ways to store true color images - pixel vs. plane
With both you will have a 3D dataset. With pixel interlace mode,
it will be stored like this: [height][width][pixel_components]
For plane interlace the data will be store as: [pixel components][height][width]
Refer to the Image and Palette Specification for further details.
What kind of palettes are supported?
-
The Image and Palette
Specification covers the types of images and palettes supported in HDF5.
Currently the two supported types of palette are "STANDARD8" or "RANGEINDEX".
(We may add more types into the future.)
You can create your own image palette. An image palette in HDF5 is just another dataset. For example, if you were only interested in values between 50 and 100, you could create a palette such that all values over 100 were white and all values smaller than 50 to be black. Something like this:
index red green blue 0 0 0 0 ... 49 0 0 0 50 whatever color you want ... 100 whatever color you want 102 255 255 255 ... 255 255 255 255Then add a palette attribute in the image dataset to point to the palette you created. An image can have more than one palettes.
If you have an X by Y image in Z bands how would you store that in HDF5 ?
-
We don't address using bands with the Image Specification. You would
have to store each band as a separate image.
How do you save an image made from (r,g,b) floating point values?
-
Basically, you would save the dataset with a floating point datatype.
See the implementation of the HDF5 High Level function, H5IMmake_image_24bit, in the HDF5 source code. To save an image made from (r,g,b) floating point values, just replace the following call in this function:
if ( H5LTmake_dataset( loc_id, dset_name, rank, dims, H5T_NATIVE_UCHAR, buffer ) < 0 ) return -1;with:
if ( H5LTmake_dataset( loc_id, dset_name, rank, dims, H5T_NATIVE_FLOAT, buffer ) < 0 ) return -1;Save the result in another function name (for example, H5IMmake_image_24bit_float).
INSTALLING/BUILDING HDF5
Having problems building an application with pre-built libraries.
-
Download the prebuilt binaries for your desired platform from the Obtain Latest page. We will refer to the directory where the HDF5 pre-built binaries are installed as /HDF5_INSTALL.
-
Run the h5redeploy script in the /HDF5_INSTALL/bin/ directory to modify the h5cc/h5fc/h5c++ scripts to point to the /HDF5_INSTALL/... directories.
-
Edit the h5cc/h5fc/h5c++ scripts in the bin/ directory and check the paths for the SZIP and ZLIB libraries. Correct them if need be. (Many systems have ZLIB installed, but may not have SZIP. If you do not have one or both libraries, you can use those in the lib/ directory.)
Also, note that the lib/libhdf5.settings file indicates which external filters are enabled in the binaries, if you are not sure.
Please note that the HDF5 pre-built binaries are built with the external SZIP and ZLIB libraries, and they include these libraries in the lib/ directory (when possible).
Below is the list of steps that we hope will help you to successfully use the prebuilt HDF5 libraries:
We recommend that you use the h5cc/h5fc/h5c++ scripts to build your application with the prebuilt HDF5 libraries.
To see the flags that are used by the h5cc/h5fc/h5c++ scripts use these commands:
h5cc -show h5fc -show h5c++ -show
How can you determine what compiler/flags are used by an HDF5 installation?
-
The
lib/libhdf5.settings file in the built HDF5 binaries
has this information.
The mtime test fails with the message 'Old modification time incorrect.', when building with an unsupported platform/compiler
-
This error is probably due to HDF5 being unable to get the current time
on your platform/compiler, and is not a serious problem. To verify that
the rest of the tests will run correctly, edit test/Makefile, remove
mtime from the TEST_PROGS variable, and re-run 'make check'.
This problem has occurred when using gcc on Solaris, which is not supported and tested.
Can't open shared library: ../lib...s#.0
-
The software finds and is trying to open the specified
shared library. If you are on a Unix machine (but not Mac OS X), just add
the path to this library to LD_LIBRARY_PATH.
If you are on Mac OS X, you must either build the shared library from
source, or change the path with install_name_tool:
install_name_tool -change <oldpath> <newpath> <library>You can use
otool -L to see what the shared path is for
a given library.
How to build HDF5 Fortran with Intel Compilers
-
If you have problems building the HDF5 Fortran APIs with the Intel
compilers on Unix systems, please do the following:
- Use the -fpp -DDEC$=DEC_ -DMS$=MS_ compiler flags to disable
DEC and MS compiler directives in source files in the fortran/src,
fortran/test, and fortran/examples directories.
See section 5.7 of the release_docs/INSTALL file. For example:
setenv F9X 'ifc -fpp -DDEC$=DEC_ -DMS$=MS_' ./configure --enable-fortran ... make .... - If step 1 doesn't work, run the following script (courtesy of
Hugh C. Pumphrey) to remove DEC and MS compiler directives from the
source in each fortran directory (fortran/src, fortran/test,
fortran/examples) before the configuration step:
#! /bin/bash # script to forcibly disable directives like !DEC$Foo and !MS$Bar # in a directoryload of Fortran 90 code. for filename in `ls *.f90` do echo hacking $filename mv $filename tmpbollox.txt cat tmpbollox.txt | sed -e "s/\\!DEC\\$/\\!FooDECS/g" \ -e s/\\!MS\\$/\\!FooMSS/g > $filename done # end script
exit subroutine.
Comment out the line:
IF (total_error .ne. 0) CALL exit (total_error)
Building on MAC OSX, the symbols restFP and saveFP come up as undefined. Why?
-
These are defined in /usr/lib/libgcc.a. To fix this, add -lgcc to your link
line or link against /usr/lib/libgcc.a.
Float to Conversion Tests Fail on AMD with Intel compiler. Why?
-
HDF5 builds properly on the AMD Opteron. However, the Float to Double
Conversion Tests Fail. For example:
Testing random sw float -> double conversions *FAILED*
test 1, elmt 323
src = 80 32 39 ac -1.19983053689896439985e-32
dst = b7 f9 1c d6 00 00 00 00 -4.61246357842685217724e-39
ans = 80 00 00 00 00 00 00 00 -0.00000000000000000000e+00
Answer:
The Intel compiler on AMD Opteron processor doesn't support denormalized floating values by default. For HDF5-1.6, the data conversion test uses random values. For 5-1.7, we have changed it to a warning if such a problem occurs.
To enable support for denormalized values, add the option -mp to the compiler. If you do not want to sacrifice speed for this feature, go to the test program and comment out the test.
In one sentence, this is a feature, not an error.
AIX Configure failure: "...config.sub: too many arguments"
Problem:
Configure failed on AIX systems with the following messages:
checking build system type... /usr/bin/oslevel[7]: /usr/bin/rm_mlcache_file: cannot execute config.sub: too many arguments
Answer:
Quick fix:
ask the AIX system administrator to "chmod 0 /usr/bin/oslevel".
Details:
Configure calls config.guess to find out what system it is on. Config.guess figures that it is on an AIX system and calls /usr/bin/oslevel to get more AIX specific information. /usr/bin/oslevel calls /usr/bin/rm_mlcache_file for some information but /usr/bin/rm_mlcache_file is changed to public inaccessible due to security problem AIX uncovers. This causes the error messages that cascade back to configure which then calls config.sub which does not like the parameters and aborts. That causes configure to fail.
% ls -lc /usr/bin/rm_mlcache_file ---------- 1 root system 12726 Apr 24 10:23 /usr/bin/rm_mlcache_fileThe quick fix is that if /usr/bin/oslevel is not executable, config.guess will not try to call but proceed on and that is okay. So, asking the system administrator to change oslevel to non-executable should fix it. The long term fix would be for AIX to make oslevel to work properly.
AIX: HDF5 fails to build with gcc
The gcc compiler is not supported on AIX systems.
The Reason:
HDF5 uses __int64 if available. __int64,
though not a POSIX standard, is supported by xlc but not by gcc.
The HDF5 configure for AIX mistakenly assumes
__int64 is always supported. Therefore the
compiling error occurs when gcc is used.
To bypass it, hardset sizeof__int64 to zero to override
what HDF5 thinks it is. E.g.,
% env ac_cv_sizeof___int64=0 CC=gcc ./configure ...
However, we do not know if gcc will work properly on AIX systems, since HDF5 has not supported this. (You are on your own.)
Using Purify on HDF5 library, get uninitialized memory read error
-
This error message is spurious.
To use Purify on the HDF5 library, you must set the "-D H5_USING_PURIFY" flag for the CFLAGS environment variable. This is done during the configure step, as follows:
env CFLAGS="-D H5_USING_PURIFY" ./configureThen the 'make' command doesn't need any extra flags, etc.
When Building HDF5, the Object Header test, "Testing message deletion", fails.
The "gmake check" fails as follows , when running the Object Header (ohdr) tests:
...
Testing object header overflow on disk PASSED
Testing message deletion *FAILED*
at ohdr.c:220 in main()...
HDF5-DIAG: Error detected in HDF5 (1.8.0-beta5) thread 0:
#000: H5Omessage.c line 948 in H5O_msg_remove(): unable to remove object header message
major: Object header
minor: Can't delete message
#001: H5Omessage.c line 1124 in H5O_msg_remove_real(): error iterating over messages
major: Object header
minor: Object not found
#002: H5Omessage.c line 1255 in H5O_msg_iterate_real(): unable to decode message
major: Object header
minor: Unable to decode value
#003: H5Omtime.c line 220 in H5O_mtime_decode(): badly formatted modification time message
major: Object header
minor: Unable to initialize object
*** TESTS FAILED ***
This problem is due to how HDF5 translates the timestamp to UTC time from mktime. It is reproducible if the TZ variable is set to "EET-2EEST", which is the timezone for Eastern Europe. The timestamp which causes this error is encoded as "19700101003321", which represents Jan. 1, 1970 00:33:21. When HDF5 tries to translate this to UTC time from mktime, it subtracts 2 hours, thus putting it in pre-Epoch time, which is known to cause problems on Windows and AIX.
If you encounter this error, you can ignore it by using the the environment variable $HDF5_Make_Ignore to tell the hdf5 Makefile to ignore test errors and continue on. For example:
env HDF5_Make_Ignore=yes gmake check
If a test fails, make will print a message (echo "*** Error ignored") and continue. Therefore, you can search the output for the string "Error ignored" to see if any tests failed.
Building on AIX 64-bit, get: ERROR: No csects or exported symbols have been saved.
-
This error has to to do with shared libraries. Try building just
the static libraries, by configuring with
--disable-shared --enable-static.
JAVA
HDFView: Unsupported fileformat error opening an HDF/HDF5 file on Mac OS X
If you are on Mac OS X and attempt to open an HDF/HDF5 file in HDFView 2.7, and it fails with this error,
HDF file :"Failed to open file xxx.hdf java.io.IOException: Unsupported fileformat - xxx.hdf"then you may be running the wrong version of HDFView for your system.
If you get this error with the 64-bit version of HDFView, then try the 32-bit version. If you get this error for the 32-bit version, then try the 64-bit version. See the HDFView Home page to obtain the 32-bit or 64-bit version of HDFView for Mac OS X.
HDFView: Unsupported fileformat error opening an HDF/HDF5 file on Windows XP
HDFView works properly on most Windows XP machines. However, on others, HDF5 is greyed out under the Help menu, and HDF5 is not supported. This occurs if the following required package is missing on that machine.
http://download.microsoft.com/download/2/0/e/20e90413-712f-438c-988e-fdaa79a8ac3d/dotnetfx35.exeIf this package is installed it should solve the problem.
Java error: can't find HDF5 file format
-
The problem may be caused by:
- not finding the Java classes
- the dynamic link library is not linked correctly
- the file format is not registered
-
import ncsa.hdf.object.h5.*;
If you cannot do that, it means that the hdf5 package is not in your classpath
-
H5File h5file = new H5File(filename);
h5file.open();If that fails, it means the dynamic link library is not linked correctly. Check your environment variable setting for your application. Make sure the path which contains the dll is in your path.
- Make sure the following code is called in your server application:
try { Class fileclass = Class.forName("ncsa.hdf.object.h5.H5File"); FileFormat fileformat = (FileFormat)fileclass.newInstance(); if (fileformat != null) FileFormat.addFileFormat("HDF5", fileformat); } catch (Throwable err ) {;}
Put the following code in your test code:
Data values are reversed/swapped/transposed in HDFView. Why?
-
This is primarily a programming language issue:
- If your file was created with a C program, then the data is stored
with the assumption that the last dimension of the slab varies fastest
("row-major order").
- If your file was created with a F90 application, then the data is stored with the assumption that the first dimension varies fastest ("column-major order").
The data itself in HDF5 is exactly the same. Because HDF Java consists of wrappers around C, it would read the data in row-major order. If your data was written with a Fortran application, then the data would appear to be transposed in HDFView.
Since HDFView doesn't know how the file was created, it gives the user the choice of swapping the dimensions, if need be. You can change the order that the data is read and viewed as follows:
-
Select the dataset by clicking with the left mouse button.
Then click the right mouse button and select "Open As".
This will pop up a window. You can change what you want to see for the height, width, and depth from this page.
The Order Returned by the Object Package for the selectedDims / selectedIndex Fields has Changed
The default order of the dimensions returned by the Object package for 3D datasets has changed a couple of times. Originally the natural order {0,1,2} was used, but at the request of a user, this was changed to {1,2,0} in HDF-Java 2.4 and 2.5. Due to other requests and careful consideration, this order was changed back to {0,1,2} in HDF-Java 2.6.
The Object package provides a simple way to create objects in HDF-Java. It has to have a default order for creating a dataset. Most users expect a dataset to be created with the natural order {0,1,2}. However, a user can set the dimensions in any order they choose.
Do you support Java 64-bit with HDF-JAVA?
-
We do support 64-bit HDF-Java on a few platforms (for example Linux x86_64).
In order to build HDF-Java on a 64-bit machine, you have to:
- build the external libraries (jpeg, zlib, szip) with 64-bit flag
- build 64-bit hdf4
- build 64-bit hdf5
- make changes to various configuration files
- build jni with 64-bit flag
- build Java classes with 64-bit Java
We are currently providing Java 64-bit support on a platform by platform basis, as we obtain funding to do so.
What are the limitations of the HDF Java interface?
In the User's Guide is a section, About this Release, which includes information about limitations and known problems in the release.
What kind of palettes are supported in HDFView?
-
The Image and Palette
Specification covers the types of images and palettes supported in HDF5.
Currently the two supported types of palette are "STANDARD8" or "RANGEINDEX".
(We may add more types into the future.)
HDFView only supports an indexed RGB color table with 256 colors (8-bit) or a 24-bit true color image. It does not support an image with any other color table. When an image has an unsupported color table, HDFView will display the image with one of the default color tables (grey, nature, wave and rainbow).
Also see the following question under "Images and Palettes": What kind of palettes are supported?
java.lang.UnsatisfiedLinkError: no jhdf5 in java.library.path
-
This error means that the HDF5 library cannot be found. Check that the paths
are set up properly (CLASSPATH, LD_LIBRARY_PATH). See the
NCSA HDF Object Package - How to use it page, for more information.
Will you be adding a pure Java interface to HDF5?
-
We have no plans on doing this.
Is parallel HDF5 supported with java?
-
No.
Can HDFView handle bit-field data?
-
No. You can store bit-field data in HDF5, but HDFView does not
know how to deal with bit-field data.
HDFView: HDF4 / HDF5 libraries are grayed out and unselectable
If you install HDFView but the HDF4 or HDF5 libraries are not highlighted and not selectable, then it indicates a problem with installing the library. This has been known to happen for a number of reasons, such as when there was an older HDFView library in /Library/java/Extensions that confused the install, the version of HDFView installed was for another platform, as well as when the required .NET module (.NET 3.5 Service Pack 1) was missing on Windows XP.
How do you increase the Java Virtual machine memory?
-
You can increase the heap size for the Java virtual machine by
the "-mx" option. For example, to increase the heap size to 512MB,
specify: java -mx512m
On Linux, type free to see how much memory you have on your machine.
JVMDG305: Java core not written, unable to allocate memory for print buffer.
Get EXCEPTION_ACCESS_VIOLATION when more than 1024MB is allocated to the Java Virtual machine.
When using the HDF Java package to read a dataset out of an HDF5 file, it works as expected when only 1024 MB are allocated to the java virtual machine. However, it encounters an unexpected error and crashes the VM at larger amounts. The basic process is opening the file, retrieving the file structure, getting and setting the start dimensions, and calling dataset.read. Then the file is closed.
This is a Java virtual machine problem. We have no solution for it at this time.
How to workaround error Exception - dataset too big ?
-
The size of the dataset to create or open is limited by the Java Virtual
machine, which is limited by the machine RAM.
If your machine has enough memory to hold the data and you still get an out of memory error, the Java virtual machine may run out of memory. You can increase the heap size for the Java virtual machine by the "-mx" option. For example: java -mx512m
PARALLEL HDF5
What do you need to run Parallel HDF5 ?
-
You need MPI and MPI I/O. (If you can't tell whether you have
MPI I/O working on your system, let us know.)
What performance can you expect from Parallel HDF5 ?
-
HDF5 cannot do better than MPI I/O on your system. Usually
HDF5 parallel applications have little overhead over MPI I/O
applications on the same system. If MPI I/O performs well, then
you should expect good performance from Parallel HDF5.
If you want to compare the performance of MPI I/O and Parallel HDF5 on your system, you can use the h5perf program that is built along with the parallel library. This is under the ./<HDF5 source code>/perform/ directory of the source code.
Does HDF5 support compression with parallel HDF5 ? If not, why ?
As of HDF5 1.6.3, you can read compressed data but cannot write in parallel.
Why do we not support writing of compressed data in parallel? Compression uses chunking. Since chunks are preallocated in the file before writing, chunks have to be of the same size. However, the size of the compressed chunk is not known in advance.
Chunks are preallocated in the file to avoid the following problem: we allow independent I/O on raw data (with H5Dwrite), but require collective operations to operate on metadata (like the B-tree that tracks the chunks in a chunked dataset or the "free space in the file" metadata (for allocating space when a compressed chunk changes size)). Therefore, in order to allow independent raw data I/O (and simplify the collective raw data I/O), we require the chunks to be preallocated (so we don't have to change the chunk B-tree) and disallow writing to compressed chunked data and variable-length datatypes (so we don't have to allocate/free space in the file) when performing parallel I/O.
Does Parallel HDF5 support chunking ?
-
Yes. It is not necessarily efficient, though.
Can you run parallel HDF5 and the thread-safe feature together ? What about Parallel HDF5 and C++?
-
No, the thread-safe and parallel (MPI-parallel) configurations are NOT
compatible. You would need to do separate builds for each.
This is also true of C++. You cannot configure Parallel HDF5 with the --enable-cxx option.
Does Parallel HDF5 support variable length datatypes ?
-
Currently, it does NOT.
Problems installing Parallel HDF5 on IBM Regatta (with AIX5)
-
When running configure and then make, errors similar to the following come up:
Macro name H5_PACKAGE_NAME cannot be redefined. "H5_PACKAGE_NAME" is defined on line 312 of ../../src/H5pubconf.h. Macro name H5_PACKAGE_STRING cannot be redefined. ... etc ...This indicates that the HDF5 library was being built with the wrong C compiler that does not support MPI. Follow the instructions in the
./release_docs/INSTALL_parallel file, under the "IBM SP"
section. Though it is geared towards
a particular IBM installation, it does applies to the Regatta.
The only exception is that you probably don't do the:
setenv LLNL_COMPILE_SINGLE_THREADED TRUEOn the other hand, IBM has hundreds of environment variables and various compilers. Each site has various individual settings. Therefore, you should first try to compile and run some simple MPI-IO programs, both C and Fortran90, with the compilers,
mpcc_r and
mpxlf_r, respectively. Once you get your MPI parallel
environment set up, you can proceed with the instructions mentioned above.
How should you write attributes in Parallel HDF5?
You can write attributes collectively in Parallel HDF5.
How to write and NOT to write compound datasets using F90 in Parallel HDF5
-
Here are two examples of writing compound datatypes using F90 in Parallel
HDF5:
- compound_pall.f90 - Right way
- compound_p.f9 - Wrong way (for now)
Both examples write a one dimensional array of size 16 using 4 processes. Elements of the array are structures:
char*2
integer
double precision
float
The compound_p.f90 program tries to write the dataset by
each process writing one field: process 0 writes character field (16 elements) process 1 writes integer field (16 elements) process 2 writes double field (16 elements) process 3 writes real filed (16 elements)However, this is NOT currently supported by the HDF5 Library (i.e. C example fails too). It will take a lot of work to implement this, but it could be done.
The compound_pall.f90 program writes a dataset by each process
writing all 4 fields and its own portion of the data array:
process 0 writes character, integer, double and real fields for elements 1 through 4 process 1 writes character, integer, double and real fields for elements 5 through 8 process 2 writes character, integer, double and real fields for elements 9 through 12 process 3 writes character, integer, double and real fields for elements 13 through 16This works for both independent and collective writes, and unfortunately is the only way now for Fortran to write a compound datatset in parallel. It is very cumbersome and inefficient.
MPI ... failed: array services not available
-
This error indicates that some services needed for MPI are not running.
Contact your system administrator for help. Following is a sample program
for C and Fortran that uses MPI I/O, but does not use HDF5. If you can get
this to run, then you should be able to get HDF5 to run:
Sample_mpio.c Sample_mpio.f90One user reported that to turn on the array services he logged in as root, ran chkconfig array on, then restarted his machine.
How do you write data when one process doesn't have or need to write data ?
-
The following examples show how to write data collectively and
independently when one process doesn't have data or does not need to write
data.
- coll_test.c - Uses H5Sset_none to tell
H5Dwrite call that there will be no data. 4-th process HAS to participate
since we are in a collective mode.
- ind_test.c - Specifies which process writes data. H5Dwrite is not called by the 4-th process at all in this case; this approach will work only when independent mode is used.
Does Parallel HDF5 support shared libraries?
For Parallel HDF5, configure disables shared libraries unless the user explicitly enables them with the --enable-shared option.
Shared parallel libraries are not tested, so please use this option with caution. In other words, please make sure that all of the tests passed before using the library.
How can I read/write a dataset greater than 2GB?
If you use the default file access property list (serial) for HDF5, you can read or write a dataset greater than 2GB with one call.
However, if you set the FAPL or DXPL to use the MPI-I/O file driver, you will not be able to do this. The problem is that the MPI standard specifies that the 'count' parameter passed to MPI read and write operations be a 32-bit integer.
There are ways in HDF5 to get around this limitation in the standard by concatenating several derived datatypes, in order to reduce the count to a lower number than what a 32-bit integer can hold. However, this also breaks ROMIO (the MPI-I/O implementation used by almost all MPI libraries). This is a known limitation of ROMIO, where the most I/O ROMIO can do in a single operation is 2 GB. That is not the same problem as the 'count' parameter being 32 bytes, but rather a limit in ROMIO itself. So unless a fix is implemented in the ROMIO library, the work around the MPI standard (mentioned above) will not work.
The solution for now is to do multiple read/writes as necessary so that the total number of data read/written per call is less than 2 GB. We have a Parallel HDF5 Tutorial here:
http://www.hdfgroup.org/HDF5/Tutor/parallel.html
See the hyperslab selection examples in the tutorial for how to select a subset of a dataset:
http://www.hdfgroup.org/HDF5/Tutor/phype.html
PROPERTIES
What function do you use to get compression level information?
-
The function H5Pget_filter_by_id returns this information.
How do you work with a file created with the file family feature?
-
The file family feature is specified as a File Access Property List.
The Property List topic in the HDF5 Tutorial talks about this feature.
To access a file with the file family feature, when you open or create
the file, the name of it should include a printf integer
format specifier (which gets replaced with the family member number). For
example, junk%d.h5 would result in files, such as,
junk0.h5, junk1.h5, junk2.h5 ...
There are some problems with the file family feature with the 5-1.6* (and earlier) releases.
Right now, if you are going to use the file family feature, we recommend that when you read the file, you know that it was created with the file family feature and what the file member size is. Also, you must have written enough data to the file when you created it, to fill up the first file member. Otherwise, HDF5 will re-set the file member size to the size of the data that was written.
Here is an example of how you would read a file that was created using the file family feature:
...
#define FILE "junk%d.h5"
main() {
hid_t file_id, fapl;
hsize_t msize;
herr_t status;
fapl = H5Pcreate (H5P_FILE_ACCESS);
msize = 1024*1024;
status = H5Pset_fapl_family (fapl, msize, H5P_DEFAULT);
file_id = H5Fopen(FILE, H5F_ACC_RDWR, fapl);
...
As you can see above, it requires you to know ahead of time that the files
were created with the file family feature, and what the file member size
is. In actuality, for read-only, the size you specify doesn't matter.
That's why h5dump and h5ls can read the file. (You can change 'msize' to
1 in the code above and it works!)
However, if you open the file using the wrong file member size, and try to *write* data to the file, it may not work as expected.
The problems with the file family feature will be fixed in a future release.
Can you work with an HDF5 file in memory ?
Yes. You can create an HDF5 file in memory using H5Pset_fapl_core and use the backing_store parameter to write the data to disk on closing.
Can I subsequently open the file and put the data in memory ?
Yes, with HDF5 1.8.0, you can bring the file into memory, read it, modify it and write it back, using the core driver. With HDF5 1.6, you could only write it back, with no open.
When writing chunks and using Fletcher checksums are there any situations where the HDF5 API will do a read of a "chunk" under the covers when an application is writing a file ?
Yes, the library will perform a read when partially updating a chunk that is written to. This is required when chunks are filtered because the filter (checksum in this case) is performed on the entire chunk, not just a portion of the chunk. Additionally, if the chunk cache is enabled and used (likely), the chunk will be read from the file before being partially updated in memory, and then eventually written back out.
UTILITIES
How do you use the h5cc (h5fc, h5c++) utility?
-
(Below, h5cc is specified but the information applies to h5fc
and h5c++, as well.)
- cd to the bin/ directory in the pre-compiled binaries.
- Run ./h5redeploy and enter yes to the question. This will fix some of the paths used in h5cc.
- Edit h5cc and search for LDFLAGS and CPPFLAGS. Check, and if need be, update the paths for the external libraries (SZIP, ZLIB). If ZLIB is in a default location, then it will probably be okay, but the SZIP path will need to be updated.
You can just type, h5cc -o prog prog.c where prog is the executable that gets created and prog.c is your application.
Use h5cc -show to see what libraries and compiler are used by h5cc.
If building the HDF5 library from source, then the compile scripts should be ready to use without changes.
If using the pre-compiled binaries that we provide, you will need to do a few things before you can use h5cc, as it has site specific paths in it.
After you have copied the files to the final installation directory, do the following:
The h5cc utility should then be ready for use in most cases. However, if you use a different compiler name or if some of the required libraries are in non-standard places, you may need to edit it and modify some other variables. Take a look at these variables in the script:
prefix - Path to the HDF5 top level installation directory
CCBASE - Name of the alternative C compiler
CLINKERBASE - Name of the alternative linker
LDFLAGS - Path to different libraries your application will link with
(this path should include the path to the zlib library)
LIBS - Libraries your application will link with
Why is h5dump slower than h5ls?
-
The h5dump utility creates a table and then displays it, whereas
h5ls displays the data as it reads it. With large files, for
example files greater than 2 GB, h5dump will be very slow, and
is not ideal to use. You may want to look at other tools or ways of
accessing your data if you are having problems due to the size of the file.
Using HDFView may be an alternative. You can also use h5ls -f -r to
get a list of objects and their absolute paths, and then just use
h5dump to view specific datasets.
Can you add an option to h5dump or h5ls to print the version of a file ?
-
No, we do not plan on adding this option. Users should use attributes to
specify the version of a file. There are many reasons why we shouldn't
add this. For example, different objects in the file could be created or
modified by different versions of the library.
- - Last modified:February 02nd 2012
