There are a number of things that can affect performance:
- Chunk Size Too Small:
-
There is overhead for chunking. If the chunk size is really small,
then your files will be larger and there could be performance issues.
- Chunk Size Greater than Chunk Cache Size:
For best performance, the chunk cache size should be equal to or greater than the chunk size for a dataset. The default size of the chunk cache is 1MB. Be aware that if your dataset chunk size is greater than 1MB, then you need to increase the chunk cache size to hold at least one chunk.
This can be done with the H5Pset_chunk_cache call, which adjusts the chunk cache parameters on a per-dataset basis, as opposed to a global setting for the file (see H5Pset_cache).
- Memory Dataspace Has Different Shape Than File Dataspace:
Make the buffer for your memory dataspace have the same number of dimensions as your dataset. With datasets that have several dimensions, the performance difference can be dramatic.
See the example: h5pmem.c
- Keeping the Dataspace ID Open When Writing in a Loop:
- If writing to a portion of a dataset in a loop be sure
to close the dataspace with each iteration, as this can
cause a large temporary 'memory leak'.
- Keeping Many Objects (Thousands) Open.
- To determine how many objects are open, call
H5Fget_obj_count.
To get a list of open object identifiers, call H5Fget_obj_ids.
- Using a "Wrong" Access Pattern:
- Accessing columns in C or rows in Fortran can cause performance
problems.
- Numerous Issues Related to Memory Usage:
- There are cache size issues when there are many caches. For
example, this can happen if an application leaves many chunked
datasets open. The application memory will grow, causing
swapping to disk.
- Using Filters, the Checksum Property and Datatype Conversions.
- Using Variable Length Datatypes:
- Datasets with variable length datatypes cannot be compressed.
Also, if you edit datasets with variable length datatypes frequently and
close the file between edits, it can leave holes in the file. A workaround
is to leave the file open while editing the datasets.
- Using Compound Datatypes in Fortran 90 and Java:
-
Compound datatypes work well with C, but they are slow when using them with
Fortran or Java. They are also cumbersome, because you can only read/write
data by field in F90 and Java. [It is not possible to pass an array of
Fortran structures to a C function in a portable manner. In any case,
the Fortran layer has to repack the Fortran array to an array of C
structures. The main problem is that Fortran enforces type checking at
compilation time and it is impossible to overload the h5dread/write_f
function with a datatype that is defined by the user.]
- - Last modified:May 16th 2011
