Overview
In the 1.6.4 release, a new implementation of the metadata cache was added.In case you are not familiar with the metadata cache, it exists to cache metadata on the entire file, and exists as long as the file is open.
From the user perspective, the most striking change that a user may see with the new cache, is that there should be a large reduction in the cache memory requirements when working with complex HDF5 files.
Users working with such files may also notice a reduction in file close time.
Those working with HDF5 files with a simple structure should not notice any particular changes in most cases. In rare cases there may be a significant improvement in performance.
The version of the new metadata cache in the 1.6.4 release is a work in progress -- the full implementation will not appear until the 1.8 release.
In particular, the mdc_nelmts parameter in H5Pget_cache() and H5Pset_cache() is not hooked to anything at present. Instead, in 1.6.4, the new metadata cache is hard coded to a maximum size of 4 MB. Our tests indicate that this size is sufficient for the most complex file we have encountered in the wild -- if it is too small for you, in this version you will have to increase the default cache size (H5C__DEFAULT_MAX_CACHE_SIZE in H5Cprivate.h) and recompile.
In future versions you will be able to change cache size at run time.
The remainder of this document contains an architectural overview of the old and new metadata caches. It can be safely skipped by anyone who works only with HDF5 files with relatively simple structures (i.e. no huge groups, datasets with large numbers of chunks, or objects with large numbers of attributes.)
The Old Metadata Cache
The old metadata cache indexed the cache with a hash table with no provision for collisions. Instead, collisions were handled by evicting the existing entry to make room for the new entry. Aside from flushes, there was no other mechanism for evicting entries, so the replacement policy could best be described as "Evict on collision".As a result, if two frequently used entries hashed to the same location, they would evict each other regularly. To decrease the likelihood of this situation, the default hash table size was slightly more than 10,000. However, since the size of metadata entries is not bounded, and since entries were only evicted on collision, this allowed the cache size to explode when working with HDF5 files with a complex structure.
The "Evict on Collision" replacement policy also caused problems with the parallel version of the HDF5 file, as a collision with a dirty entry could force a write in response to a metadata read. Since all metadata writes must be collective in the parallel case, this caused the library to hang. Prior to the implementation of the new metadata cache, we dealt with this issue by maintaining a shadow cache for dirty entries evicted by a read.
The New Metadata Cache
The new metadata cache was designed to address the above issues. After implementation, it became evident that the working set size for HDF5 files varies widely depending on both structure and access pattern. Thus it was necessary to add facilities for cache size adjustment under either automatic or "manual" control. These latter features will appear in the 1.8 release.Structurally, the new metadata cache can be thought of as a heavily modified version of the UNIX buffer cache as described in chapter three of M. J. Bach's "The Design of the UNIX Operating System". In essence the UNIX buffer cache uses a hash table with chaining to index a pool of fixed size buffers. It uses the LRU replacement policy.
Since HDF5 metadata entries are of no fixed size, and may grow arbitrarily large, the size of the new metadata cache cannot be controlled by setting a maximum number of entries. Instead the new cache keeps a running sum of the size of all entries, and will attempt to evict entries as necessary to stay within the specified maximum size. Candidates for eviction are chosen by the LRU replacement policy, and a LRU list is maintained for this purpose.
The cache cannot evict entries that are locked, and thus it will temporarily grow beyond its maximum size if there are insufficient unlocked entries to evict.
To help avoid generating writes in response to a read while running in parallel, the cache also maintains a clean LRU list. This list contains only clean entries, and is used as a source of candidates for eviction when servicing a read request in parallel mode. If the clean LRU list is exhausted, the cache will temporarily exceed its specified maximum size.
To increase the likelihood that this will not happen, the new cache allows the user to specify a minimum clean size -- which is a minimum total size of all the entries on the clean LRU. In 1.6.4, this value is hard coded to 2 MB. If you need to change this, change the value of H5C__DEFAULT_MIN_CLEAN_SIZE in H5Cprivate.h and recompile. In future versions, this parameter will be accessible from the API. Note that the clean LRU list is only maintained in the parallel version of the HDF5 library, and thus that the minimum clean size is only relevant when running the parallel version of the library.
- - Last modified:June 25th 2007
