Blog

Workarounds for OpenMPI bug exposed by make check in HDF5 1.13.3

Neil Fortner, Chief HDF5 Software Architect, The HDF Group

While developing HDF5 1.13.3, a bug was discovered in OpenMPI’s default I/O layer, affecting OpenMPI versions 4.1.0-4.1.4. It will be fixed in future releases. This bug can cause incorrect results from MPI I/O requests, unless one of the following parameters is passed to mpirun:

  • --mca io ^ompio
  • --mca fbtl_posix_read_datasieving 0

The first switches the I/O layer from OMPIO to ROMIO, and the second disables the data sieving feature in OMPIO which contains the bug. We recommend using one of these workarounds when running any application that uses MPI I/O with an affected version of OpenMPI.

While this bug potentially impacts all applications that use MPI I/O, including older versions of HDF5, no test in the HDF5 test suite triggered the fault until HDF5 1.13.3. This means that make check will fail with HDF5 1.13.3 and the affected versions of OpenMPI, unless one of the above workarounds is used.

To have make check use one of these workarounds you can, for autotools builds, prior to running configure, set the RUNPARALLEL environment variable by using one of these commands (depending on which workaround you want to use):

  • export RUNPARALLEL="mpirun --mca io ^ompio -n 6
  • export RUNPARALLEL="mpirun --mca fbtl_posix_read_datasieving 0 -n 6

For cmake builds you can enable one of these workarounds by adding it to the DMPIEXEC_PREFLAGS:STRING option:

  • -DMPIEXEC_PREFLAGS:STRING=--mca;io;^ompio
  • -DMPIEXEC_PREFLAGS:STRING=--mca;fbtl_posix_read_datasieving;0

These can be added either as a command line parameter for the cmake configure command, or to the HDF5options.cmake file as explained in release_docs/INSTALL_CMAKE. If this option is already present, make sure to append the workaround to the existing value for the option instead of adding the option a second time:

  • -DMPIEXEC_PREFLAGS:STRING=<...>;--mca;io;^ompio
  • -DMPIEXEC_PREFLAGS:STRING=<...>;--mca;fbtl_posix_read_datasieving;0

There may be cases where one or both of these workarounds will cause other issues. If these workarounds do not work for you we suggest trying a different version of OpenMPI (possibly a recent development snapshot) or a different MPI implementation altogether. Alternatively you can simply ignore the make check failure and hope the problem doesn’t occur in your app. Things will most likely be fine as long as your app uses simple I/O patterns.

For more information on this problem you can view the OpenMPI issue on GitHub at https://github.com/open-mpi/ompi/issues/10546.

Post Tags:

No Comments

Leave a Comment