There are a number of things that can affect performance:
- Chunk Size Too Small:
-
There is overhead for chunking. If the chunk size is really small,
then your files will be larger and there could be performance issues.
- Chunk Size Larger Than 1MB:
If your chunk size is larger than 1MB, then you should set the chunk cache size to be equal to or larger than your chunk size, using H5Pset_cache.
- Memory Dataspace Has Different Shape Than File Dataspace:
Make the buffer for your memory dataspace have the same number of dimensions as your dataset. With datasets that have several dimensions, the performance difference can be dramatic.
See the example: h5pmem.c
- Keeping the Dataspace ID Open When Writing in a Loop:
- If writing to a portion of a dataset in a loop be sure
to close the dataspace with each iteration, as this can
cause a large temporary 'memory leak'.
- Keeping Many Objects (Thousands) Open.
- Using a "Wrong" Access Pattern:
- Accessing columns in C or rows in Fortran can cause performance
problems.
- Numerous Issues Related to Memory Usage:
- There are cache size issues when there are many caches. For
example, this can happen if an application leaves many chunked
datasets open. The application memory will grow, causing
swapping to disk.
- Using Filters, the Checksum Property and Datatype Conversions.
- Using Variable Length Datatypes:
- Datasets with variable length datatypes cannot be compressed.
Also, if you edit datasets with variable length datatypes frequently and
close the file between edits, it can leave holes in the file. A workaround
is to leave the file open while editing the datasets.
- Using Compound Datatypes in Fortran 90 and Java:
-
Compound datatypes work well with C, but they are slow when using them with
Fortran or Java. They are also cumbersome, because you can only read/write
data by field in F90 and Java. [It is not possible to pass an array of
Fortran structures to a C function in a portable manner. In any case,
the Fortran layer has to repack the Fortran array to an array of C
structures. The main problem is that Fortran enforces type checking at
compilation time and it is impossible to overload the h5dread/write_f
function with a datatype that is defined by the user.]
- - Last modified:September 25th 2007
