Electron microscopy, functional MRI, biomedical simulations, particle detectors and other instruments generate increasingly high resolution imagery at unprecedented rates. The metadata associated with this imagery can be very complex, and is usually as important to understanding the data as the images themselves. More often than not, the processes that work with these images need to be able to search the images quickly and access specific parts of the images, hopefully without the burden of having to uncompress large amounts of data or perform exhaustive searches.
Traditional formats for storing this data, such as TIFF and DICOM are either difficult to scale, or unable to perform in terms of scalability, access speeds, and the ability to accommodate large and complex metadata. For example, the MINC (Medical Imaging NetCDF) format is based on HDF5 and provides the medical-imaging research community with a modality-neutral way to store medical images along with a rich and flexible set of supporting data.
For over 25 years, projects have found HDF to provide the combination of scalability and flexibility needed to store everything from x-rays to high dimensional tomography. A 100 TB data set is not unheard of for HDF5. At the same time, the flexibility of HDF5 has made it possible for some projects to access and analyze these enormous data sets in a matter of seconds.
Modern CT, PET, and MRI scanners generate unprecedented amounts of data at very high speeds. FDA quality requirements, as well as the medical community’s need to make the best use of this data, require flexible storage that can accommodate complex data. At the same time, data needs to easy to access quickly.
Using HDF5, many organizations and communities achieve their I/O performance, storage, quality and reliability requirements, without sacrificing their ability to accommodate virtually every kind of data. With the software ecosystem that includes every popular programming language, and analysis tools such as MATLAB, HDF5 provides easy access to medical data at every stage of the data lifecycle.