Adopting HDF5 for Simulation Data in EDEM Software
We have been developing engineering simulation software for over 10 years and we are the market-leader in Discrete Element Method (DEM) technology for bulk and granular material simulation. Our EDEM software accurately simulates and analyzes the behavior of coal, mined ores, soil, grains, tablets and powders and provides engineers with crucial insight into how those materials will interact with their equipment during a range of operation and process conditions. Companies worldwide in the mining, heavy equipment and process industries use our software to optimize equipment design, increase productivity, reduce costs of operations, shorten product development cycles and drive product innovation.
Why do we use HDF5?
We moved to HDF5 for our simulation data in 2016 from using our own proprietary file format. HDF5 had been on our radar for some time and we spent a couple of years investigating it and other file formats before deciding which we should switch to. HDF5 met all the criteria we had at the time. Amongst the criteria were: performance in speed and size, an accepted standard for scientific data, being open source, providing additional tools.
Performance. Using HDF5 was as fast as, or faster than, our previous format for reading and writing files. Including with and without compression, it was only negligibly larger.
Standard for scientific data. With the plethora of scientific institutions and software that use HDF5 it was clear to us that it was well used, tested, supported, and would have longevity.
Open Source. Having a proprietary file format meant our users couldn’t interpret their simulation data outside of EDEM, and we wanted to change that. After all, the data generated belongs to the user, so it should be as accessible as possible to them. This opens up the opportunity to build additional tools to perform different analysis on the data. More on that later.
Additional tools. Being open source there are additional tools for working with HDF5 file which makes life easier as a developer. Having API extensions or wrappers in different languages allows great flexibility too. This opened up a new world of possibilities to EDEM users, which they have been able to exploit.
How do we use HDF5?
We use HDF5 to store what we call our simulation data. This starts with the information input by the users required to start a simulation. CAD meshes that have been imported, bulk material information based on real-life calibration, factories that determine what particles to create, and more are required at set up. Secondly the “timestep” information is stored. This is the data calculated by our DEM solver. This includes information like forces, torques, velocities, positions, for the particles, meshes and contacts. We store simulation timestep data in individual files per timestep. This way we let the users get to the data they need easily. Users may be interested at a specific point in time, or the general behavior over the lifetime of the simulation.
Image: User interface of EDEM software
Dealing with any amount of data can bring challenges. The simulation data created by EDEM can vary greatly in size according to the needs of our users. Some EDEM users will be working with a small number of particles, less than 1000, and with coarse CAD meshes. That could result in files of a few kilobytes. Other users will simulate up to 10 million particles, and with CAD meshes that are highly detailed. Depending on the simulation this may create a file size of 6-10 GBs. Not very big compared to the HDF5 files generated by some applications. But EDEM users may be looking at that data across hundreds of thousands of timesteps, which we save as individual HDF5 files. Multiply that 10GB by thousands, then that becomes a lot more data to handle.
The advent of our GPU solver brought a new perspective to how we manage the data generated by the solver. With the GPU solver so much faster than using CPU alone, this brought back into focus the need for file IO performance. As solver performance improves, writing to file takes a greater percentage of simulation total time. Typically, the data generated by the solver is difficult to effectively compress so time was spent compressing data for little reduction in file size. Therefore, we have taken the step to not compress data by default with our latest release. We will be investigating other ways to improve the performance of how we read and write data so as to give EDEM users the best possible experience.
Post-processing data with ‘EDEMpy’
As we covered, DEM simulations can generate a large amount of data. They are also used to simulate a huge variety of different scenarios for different industries, all working at different scales. This means that the type of analysis being performed post-simulation also varies greatly. Pharmaceutical simulations require different analysis than simulations for agriculture. Simulations coupled with Multi-Body Dynamics (MBD) software are likely interested in different criteria that those coupled with Computational Fluid Dynamics (CFD) software.
Using HDF5 allows us and our customers to tailor the analysis to the application. With the release of our ‘EDEMpy’ tool we have provided a Python library that lets users access the data from simulation files and perform the exact analysis required on it. The motivation for it was driven by the requirement to do more complex analysis of EDEM data on ever larger simulation decks and to make simple tasks like comparing two or more EDEM simulations easier for users – you no longer need to open multiple decks and do data exports on them individually as all that data is now accessible from a Python script or prompt.
Image: Visualizing Relative Wear custom properties using EDEMpy and mayavi
You can get more detail on EDEMpy by reading our blog post on it: https://www.edemsimulation.com/blog/resources/post-process-edem-data-in-python-using-edempy/
At EDEM we have certainly reaped the rewards of using HDF5. Development of our software easier with the number of tools available. New avenues of analysis have opened up for our users through the release of EDEMpy. We have exciting ideas for how to use the technology better in the future, so keep a look out.
For more information about EDEM please visit our website: https://www.edemsimulation.com/
Rich Rowan is the Technical Lead at EDEM. Rich specializes in C++, focusing on contributing to the performance and growth of products through technical excellence. He is interested in clean code and CI.