HDF NOW File Structure ====================== HDF NOW file structure supports a "distributed HDF files". A "Distributed HDF files" (DHDF) is a set of files that collectively contain one or more distributed HDF objects. This set of HDF files consists of one MHF and one or more AHF's. The MHF, main HDF file, is the file which stores the file information and attributes and chunking information of SDS's. An AHF, associated HDF file, is a file which stores the file information of the MHF and itself and attributes and values of sub-SDS's. When a DHDF is created, a string, DHDF_SetId, an integer, DHDF_FileId, and a vdata, File_Info, are stored in each file. DHDF_SetId is a concatenated string of the time stamp, the name of the MHF and the MHP rank. DHDF_SetId is for the library to detect whether the given HDF files are a DHDF or not. If a DHDF is created for the first time, DHDF_SetId is generated by the library and stored in each of the files. Each file in the same DHDF has the matching DHDF_SetId. When a file is created, an integer (DHDF_FileId) is assigned to it. The DHDF_FileId of the MHF is always 0. The File_Info table is a table which has fields of DHDF_FileId, FileName, and HostName. The File_Info table in MHF stores the DHDF_FileId, file names, and host names of all the files in the DHDF. The File_Info table in an AHF stores the DHDF_FileId, file names and host names of the MHF and itself. A multidimensional array together with its attributes is called a scientific data set, or an SDS. (See HDF user reference manual for details.) When an SDS is partitioned and distributed among AHF's, the attributes and the chunking information of the SDS are stored in the MHF. The chunking information represented by a vdata stores the DHDF_FileId and the file name of the files where the sub-SDS's are stored and dimension sizes of the sub-SDS's. The sub-SDS's, attributes and array data, are stored in the AHF's. Structure of MHF ================ The MHF contains DHDF_SetId, DHDF_FileId, File_Info and attributes and chunking info of SDS's. The values of DHDF_SetId and DHDF_FileId are assigned by the library. The table File_Info contains the file names and where files are located. For each SDS stored in AHF's, the MHF contains the attributes (dimension sizes, data type, rank, origins, etc.) and the chunking info of the SDS. The chunking info stores the dimension sizes and origins for each sub-SDS, and the DHDF_FileId of the file where the sub-SDS is stored. The following are the specification of the MHF structure. o char DHDF_SetId[255] - an attribute of the file. - a concatenated string of time stamp, the name of the MHF and the MHP rank. o int32 DHDF_FileId - an attribute of the file. - a value that uniquely identifies each file. o vdata File_Info - The class name of File_Info is DHDF_FileInfo. - a table with fields DHDF_FileId, FileName, and HostName. - int32 DHDF_FileId - char FileName[255] - char HostName[255] o For each SDS stored among AHF's, attributes are represented by an SDS and the chunking info is represented by a vdata. - attributes of the SDS - char SDSName[255] - int32 datatype - int32 rank - int32 dimension_sizes[32] - other user-defined attributes (Note: no array data are stored here.) - vdata (chunking information) - The name of the vdata is the same as the name of the corresponding SDS. - The class name of the vdata is Distributed_SDS_Chunk_Table. - a table with fields DHDF_FileId, ChunkName, DimSizes, and Origins. - int32 DHDF_FileId - char ChunkName[255] - int32 DimSizes[32] - int32 Origins[32] Structure of AHF ================ An AHF contains DHDF_SetId, DHDF_FileId, File_Info, and sub-SDS's. The values of DHDF_SetId and DHDF_FileId are assigned by the library. The table File_Info contains the DHDF_FileId, FileName, and HostName of the MHF and the AHF itself. An AHF stores the attributes and values of sub-SDS's. Specification o char DHDF_SetId[255] - same as the DHDF_SetId in the MHF. o int32 DHDF_FileId - same as the DHDF_FileId in the MHF. o vdata File_Info - same as the File_Info in the MHF. o sub-SDS's - attributes and array data of sub-SDS's. - contain SDS's constituting chunks of distributed SDS's. An Example ========== o Assume there are three files, mhf.hdf, ahf1.hdf and ahf2.hdf. o Two SDS's are stored among the three files. - The first SDS, sds1, is divided into two sub-SDS's, sds1_chunk1 and sds1_chunk2, and stored in ahf1.hdf and ahf2.hdf, respectively. - The second SDS, sds2, is stored in ahf1.hdf and the chunk name for the sub-SDS is sds2.chunk1. Contents of mhf.hdf -------------------------------------------------------------------------------------- | DHDF_SetId = May211997mhf.hdf100 | | DHDF_FileId = 0 | | | | | | File_Info | | ------------------------------------- | | | DHDF_FileId FileName HostName | | | | 0 mhf.hdf host0 | | | | 1 ahf1.hdf host1 | | | | 2 ahf2.hdf host2 | | | ------------------------------------- | | | | | | the attributes of the 1st SDS chunking info of the 1st SDS | | ------------------------------ ------------------------------------------------ | | | SDSName = sds1 | | VdataName = sds1 | | | | rank = 2 | | Class = Distributed_SDS_Chunk_Table | | | | dimension sizes = 500 x 200| | | | | | datatype = int32 | | DHDF_FileId ChunkName DimSizes Origins | | | ------------------------------ | 1 sds1_chunk1 300x200 0, 0 | | | | 2 sds1_chunk2 200x200 300, 0 | | | ------------------------------------------------ | | | | | | the attributes of the 2nd SDS chunking info of the 2nd SDS | | ------------------------------ ------------------------------------------------ | | | SDSName = sds2 | | VdataName = sds2 | | | | rank = 2 | | Class = Distributed_SDS_Chunk_Table | | | | dimension sizes = 1000x1000| | | | | | datatype = float32 | | DHDF_FileId ChunkName DimSizes Origins | | | ------------------------------ | 1 sds2_chunk1 1000x1000 0, 0 | | | ------------------------------------------------ | | | -------------------------------------------------------------------------------------- Contents of ahf1.hdf Contents of ahf2.hdf ---------------------------------------- ---------------------------------------- | DHDF_SetId = May211997mhf.hdf100 | | DHDF_SetId = May211997mhf.hdf100 | | DHDF_FileId = 1 | | DHDF_FileId = 2 | | | | | | File_Info | | File_Info | | ----------------------------------- | | ----------------------------------- | | | DHDF_FileId FileName HostName | | | | DHDF_FileId FileName HostName | | | | 0 mhf.hdf host0 | | | | 0 mhf.hdf host0 | | | | 1 ahf1.hdf host1 | | | | 2 ahf2.hdf host2 | | | ----------------------------------- | | ----------------------------------- | | | | | | attrs & array data of sds1_chunk1 | | attrs & array data of sds1_chunk2 | | ------------------------------- | | ---------------------------------- | | | SDSName = sds1_chunk1 | | | | SDSName = sds1_chunk2 | | | | datatype = int32 | | | | datatype = int32 | | | | rank = 2 | | | | rank = 2 | | | | dimension sizes = 300x200 | | | | dimension sizes = 200x200 | | | | | | | | | | | | array data: | | | | array data: | | | | 1 2 3 ... | | | | 60001 60002 ... | | | | . | | | | . | | | | . | | | | . | | | ------------------------------- | | ---------------------------------- | | | ---------------------------------------- | | | attrs & array data of sds2_chunk1 | | ------------------------------- | | | SDSName = sds2_chunk1 | | | | datatype = float32 | | | | rank = 2 | | | | dimension sizes = 1000x1000 | | | | | | | | array data: | | | | 1.000 1.001 ... 1.999 | | | | 2.000 2.001 ... 2.999 | | | | . | | | | . | | | |1000.000 ... 1000.999 | | | ------------------------------- | ----------------------------------------