Each new generation of sequencing instruments generates orders of magnitude more data than the previous generation, at a fraction of the cost. Those who must acquire, process, analyze, and archive that data are continually scrambling to adapt their data management tools to this explosive growth. Downstream processing requires that the data be available in the right form to a variety of analytical methods, which historically has meant reformatting the data many times, as well as generating new kinds of data, as it passes through the workflow.
The ability of HDF5 to accommodate virtually every kind of data in a single file has been recognized by many researchers as providing the ideal container for sequencing data. Sequences, images, SNP matrices, and every other type of data and metadata associated with an experiment can be stored in a single HDF file that can be passed from module to module in the data processing workflow, greatly reducing the headaches associated with data management. At the same time, HDF5 performs extremely well, indeed, much better than most common formats used in the industry. For example, Oxford Nanopore Technologies uses HDF5 to help manage large volumes of gene sequencing data.