This patch provides code for two read-only HDF5 vfd prototypes: S3 vfd and HDFS vfd. These virtual file drivers allow HDF5 to open, read, seek and close 1.) HDF5 files stored in AWS S3 buckets and 2.) HDFS files.
Applying the patch:
The vfds_hdf5-1.10.4.patch file should be applied to fresh HDF5 1.10.4 code as follows:
1.) Put the patch file in the top level of the HDF5 source code or another accessible location.
2.) cd to the top level of the HDF5 source code and run
“patch -p 1 < <path to file>/vfds_hdf5-1.10.4.patch”.
The operation should complete without errors or questions. Then run
3.) Configure and build HDF5. The vfd provides two added configure options:
–with-libhdfs=<location of Hadoop install>
Either, both or neither of these options may be specified as desired, although the second option requires a valid path to a Hadoop install directory containing lib/native/libhdfs*.
Setup for vfd options:
No extra setup is required to configure and build HDF5 with this option, but testing HDF5 with the S3 vfd enabled using “make check” requires:
1.) set the environment variable HDF5_ROS3_TEST_BUCKET_URL to a URL for an accessible AWS S3 bucket containing HDF5 files.
2.) the proper credentials in an AWS location for access to that bucket. Check the Amazon AWS website for S3 bucket access information.
If hadoop is installed, enter the path to the Hadoop install directory as described above. For users wishing to experiment with Hadoop who have not previously installed it, the hadoop-testing-bundle is included with this patch.The bundle includes a README file with details and scripts for installing and setting up hadoop for connections from HDF5. Installing Hadoop and providing the path should be sufficient to configure and build HDF5 with the option, provided java is enabled.
Using the HDFS vfd requires the following in HADOOP_HOME/etc/hadoop: Environment variables JAVA_HOME and HADOOP_HOME are set correctly in hadoop-env.sh. The configurations in core-site.xml and hdfs-site.xml in hadoop-testing-bundle are present in the files with those named in HADOOP_HOME/etc/hadoop. The hadoop service processes are listening on the expected ports. See the README file in the hadoop-testing-bundle for details.