Along with proposing a standard BioHDF data format, BioHDF comes with a series of tools to make interacting with data stored in HDF easier.
HDFView: This is a java based viewer that allows users to view and edit HDF files.
BioHDF Command-line Tools: Command-line tools to handle genome sequence data in HDF5 files (provided by Geospiza under GPL-v2 licensing).
Sample PERL Wrappers and Documentation
Sample Perl wrappers to HDF5 are provided to illustrate how one might store genomic sequence data in HDF5, and to engage the bioinformatics community in these investigations.
The software distribution contains two modules: HDFPerl (wrappers) and BioHDF_Perl (high level APIs).
HDFPerl. Wrappers for a subset of the HDF5 functions have been developed to provide a simple Perl interface to HDF5.
BioHDF_Perl. A second Perl API has been implemented to illustrate how one might import genomic sequence data from FASTA format files into the HDF5 format. This API also creates indexes in HDF5 that allow limited search operations on data.
Performance Study: HDF5 vs. FASTA
A performance study was conducted in which we compare HDF5 with the FASTA format in terms of (a) storage use and (b) time to access genomic sequence data using traditional text-management tools for FASTA and BioHDF_Perl for HDF5. Results show that HDF5 can provide storage efficiency through its use of compression and still allow fast random access through its ability to store indexes along with compressed, chunked data.
The location of the performance study document is:
ftp://ftp.hdfgroup.uiuc.edu/pub/outgoing/BioHDF/Perl/BioHDF_performance.pdf
NOTE: Please be aware that the software available from this page is experimental software. It has not been fully tested and currently is NOT SUPPORTED.
- - Last modified:June 24th 2009
