This guide describes how to install and start using the Read-Only S3 (ROS3) VFD for HDF5. The Read-Only S3 VFD transparently accesses HDF5-format files hosted remotely on Amazon’s Simple Storage Service (S3). This VFD supplies bytes transparently to the HDF5 library through the AWS REST API.
Supported Operating Systems
This VFD is currently supported only on Linux. Windows and macOS will be available in a later release.
Software Prerequisites
The following libraries and tools are required prior to installing the Read-Only S3 VFD:
Installation Instructions
The following describes the process to install the Read-Only S3 VFD and the associated unit and regression tests.
Step 1. Extract source tarball or download the source
tar -zxf hdf5-1.11.0-of20180219.tar.gz
cd hdf5-1.11.0-of20180219/
or
or
If you are not provided with a source tarball, download it from bitbucket. This requires the use of Autotools to prepare for build.
git clone https://bitbucket.hdfgroup.org/scm/hdffv/hdf5.git hdf5_ros3
cd hdf5_ros3
./autogen.sh
Step 2. Modify environment variables to include new libraries
Depending on the installation of OpenSSL and libcurl, it may be necessary to manually set system variables for CPPFLAGS and LDFLAGS:
e.g.:
export CPPFLAGS="-I/usr/local/opt/openssl/include -I/usr/local/opt/curl/include"
export LDFLAGS="-L/usr/local/opt/openssl/lib -L/usr/local/opt/curl/lib"
Step 3. (optional) Set up S3 test bucket and credentials
Pull down test files from somewhere
Put files on S3
Set environment variable for bucket URL
Set credentials
If this step is not done or not done properly, some S3 tests will skip during make-check.
Step 4. Configure and build HDF5 with the Read-Only S3 VFD
Modify the HDF5 build flags as appropriate. The --enable-ros3-vfd
flag is required to enable Read-Only S3 VFD features.
./configure --enable-ros3-vfd --enable-shared --enable-java
make
make check
make install
HDF5 APIs for use with the Read-Only HDFS VFD
The following new APIs were added to HDF5 for use with the Read-Only S3 VFD, H5Pget_fapl_ros3()
and H5Pset_ros3_fapl()
. Man pages for these can be found in ros3_api_reference.txt
. Further usage information can be found in usage.txt.
H5Pget_fapl_ros3()
This gets the information of the given Read-Only S3 VFD. The information from the fapl fapl_id is copied to the H5FD_ros3_fapl_t
structure pointed to by fa. This returns a non-negative value if successful and returns a negative value otherwise.
herr_t H5Pget_fapl_ros3(hid_t fapl_id, H5FD_ros3_fapl_t *fa)
Parameters:
hid_t fapl_id IN: File access property list identifier.
H5FD_ros3_fapl_t *fa OUT: H5FD_ros3_fapl_t structure destination
Example:
/* assumes fapl_id has been created and set with HtPset_fapl_ros3() */
H5FD_ros3_fapl_t fa;
fa.authenticate = 16; /* neither TRUE (0) nor FALSE (-1) */
assert( 0 >= H5Pget_fapl_ros3(fapl_id, &fa) );
assert( fa.authenticate == 0 || fa.authenticate == -1 );
H5Pset_ros3_fapl()
This sets up the Read-Only S3 VFD. It sets the file access property list fapl_id to use the Read-Only S3 VFD. In addition to requiring very different underlying operation, files on S3 may have restricted access, requiring that attempts to access and read provide “credentials” to authenticate the recipient and message integrity. The structure H5FD_ros3_fapl_t contains a flag to indicate whether or not this authentication is to take place, as well as to supply credentials to the virtual file driver.
If the configuration structure is set to _not_ authenticate, e.g., fa.authenticate == (hbool_t)FALSE, then the credential fields aws_region, secret_id, and secret_key are ignored.
If configuration structure is set to authenticate, e.g., fa.authenticate== (hbool_t)TRUE, then credential fields must be populated with null-terminated strings. Each component is an array of characters, the size of which is determined by a constant in H5FDros3.c, e.g., H5FD__ROS3_MAX_REGION_LEN. If the string exceeds the defined length, an error has likely occurred and behavior is undefined.
herr_t H5Pset_fapl_ros3(hid_t fapl_id, H5FD_ros3_fapl_t *fa)
Parameters:
hid_t fapl_id IN: File access property list identifier.
H5FD_ros3_fapl_t *fa IN: Structure containing fapl configuration information.
Example:
hid_t fapl_id = -1;
/* default, non-authenticating, “anonymous” fapl info */
H5FD_ros3_fapl_t fa = { 1, 0, “”, “”, “” };
#if AUTHENTICATE_STATIC_VARS
/* fapl info with authentication credentials provided statically */
fa = {
1, /* version */
1, /* authenticate */
“us-east-2”, /* aws_region */
“AKIAIMC3D3XLYXLN5COA”, /* access_key_id */
“ugs5aVVnLFCErO/8uW14iWE3K5AgXMpsMlWneO/+” /* secret_access_key */
};
#elif AUTHENTICATE_DYNAMIC_VARS
/* fapl info populated dynamically
* Assumes variables `should_authenticate`, `the_region`,
* `the_access_key_id`, and `the_secret_access_key` have been set somewhere
*/
fa.authenticate = should_authenticate; /* 0 (FALSE) or 1 (TRUE) */
strncpy(fa.aws_region, the_region, H5FD__ROS3_MAX_REGION_LEN);
strncpy(fa.secret_id, the_access_key_id, H5FD__ROS3_MAX_SECRET_ID_LEN);
strncpy(fa.secret_key, the_secret_access_key,
H5FD__ROS3_MAX_SECRET_KEY_LEN);
#endif /* set authenticating fapl info statically or dynamically */
/* create and set fapl entry */
fapl_id = H5Pcreate(H5P_FILE_ACCESS);
assert( 0 >= fapl_id );
assert( 0 >= H5Pset_fapl_ros3(fapl_id, &fa) );
Using HDF5 Tools with the Read-Only HDFS VFD
The following tools have been modified to use the Read-Only S3 VFD. See demo_tools.txt for case examples.
h5dump
h5ls
h5stat
A new command-line argument has been provided in for accepting AWS credentials in h5dump, h5ls, and h5stat.
–s3-cred=(<aws_region>,<access_key_id>,<secret_key>)
Escape parentheses as appropriate for your shell, e.g. BASH, wrap the entire tuple in quotations:
–s3-cred=”(…)”
Please read the tools’ provided help messages for further details. These commands interact correctly with the existing flags for each command. Credentials, via –s3-cred, may be omitted for anonymous access.
h5dump
h5dump [ -f ros3 | –filedriver=ros3 ] [ –s3-cred=”(…)” ]
h5ls
h5ls [ –vfd=ros3 ] [ –s3-cred=”(…)” ]
h5stat
h5stat [ –s3-cred=”(…)” ]
Known Issues
Anonymous access with authenticating FAPL. An authenticating fapl can be used to open an anonymously-accessible file, but incurs some overhead in the application – authentication is performed to create requests to S3, but the authentication information is ignored by the server.
API subject to change. The API calls and tool command-line interfaces for the Read-Only HDFS VFD may change when this VFD is made available in a future release as a plug-in VFD module.
Technical Support
For assistance with this product, please contact The HDF Group’s customer support team at help@hdfgroup.org.