Kita™: Advancing energy innovation

The following is an excerpt from an National Renewable Energy Lab (NREL) press release.

NREL Releases Major Update to Wind Energy Dataset

May 8, 2018

A massive amount of wind data was recently made accessible online, greatly expanding the amount of information available on wind flow across the continental United States.

The data from the Energy Department’s National Renewable Energy Laboratory (NREL) enables anyone considering building a wind plant, or even erecting a single turbine, to understand how strong breezes tend to blow across a particular area and how energy from the wind can be integrated into the electrical grid.

Wind turbines stretch to the horizon on this property in Iowa.

NREL is making available massive amounts of data that can help determine where to install wind turbines, such as these in Iowa. (Photo by Dennis Schroeder / NREL)

Originally released in 2015, the Wind Integration National Dataset—also known as the WIND Toolkit—made 2 terabytes (TB) of information available, covering about 120,000 locations identified using technical and economic considerations. The newly released subset holds 50 TB, or 10 percent of the entire database, covers 4,767,552 locations, and extends 50 nautical miles offshore. Small sections of Canada and Mexico are included as well.

“The entire dataset is 500 terabytes,” said Caleb Phillips, a data scientist at NREL. “This is far and above the largest dataset we work with here at NREL.”

The data was always available, just not easily or in a simple, usable form. To make the information readily accessible, NREL utilized its ongoing relationships with Amazon Web Services (AWS) and The HDF Group. Having the dataset hosted on AWS will remove previous limitations on the amount of information that can be accessed readily online.

“What we’ve tried to do is make this really easy, so folks can play with the data and use it to better understand the potential for wind resources at a greater number of locations,” said Phillips. “They can download only the data they want.” An interactive online visualization lets users interact with the data.

The HDF Group developed the Highly Scalable Data Service (HSDS) using the AWS cloud to provide users with easy access to the data, which is stored as a series of HDF5 files. The information can be narrowed to a specific site or time and analyzed using either a custom software solution or the Amazon Elastic Compute Cloud (Amazon EC2).

“We are very excited to work with both NREL and AWS to make their large, technical data sets more accessible through our new scientific data platform, HDF Cloud,” said David Pearah, CEO of The HDF Group. “Our work aims to pave the way for large repositories of scientific data to be moved to the web without compromising query performance or resources.”

The WIND Toolkit provides barometric pressure, wind speed and direction, relative humidity, temperature, and air density data from 2007 to 2013. These seven years of data provide a detailed view of the U.S. wind resource and how it varies minute to minute, month to month, and year to year. These historical trends are essential for understanding the variability and quality of wind for power production. The simulated results were computed by 3Tier under contract for NREL using the Weather Resource Forecast model.

“Now that we have a data platform that supports release of large data sets, we hope to use this capability to release other big data as well that were previously considered too large to make publicly available,” Phillips said. Coming online next are solar irradiance data and wind data for Mexico, Canada, and potentially other countries. “We are thrilled to make these datasets available, allowing researchers to more easily find and use the data, as well as reducing costs for the national laboratory.”

While measurements across the rotor-swept areas are the best way to determine wind conditions at a site, that’s not always possible. The WIND Toolkit provides an estimate, but actual conditions can be validated using on-site measurements as required.

The first release of data prompted regular calls from people in academia, industry, and government wanting additional information. The federal Bureau of Oceanic Energy Management contracted with NREL to provide additional information for offshore areas. The WIND Toolkit Offshore Summary Dataset was made publicly available last year.

The original work to develop and release the WIND Toolkit was funded by the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, Wind Energy Technologies Office.

NREL is the U.S. Department of Energy’s primary national laboratory for renewable energy and energy efficiency research and development. NREL is operated for the Energy Department by The Alliance for Sustainable Energy, LLC.

What’s possible with Kita™?

The Highly Scalable Data Service (HSDS) is an open source project developed by The HDF Group. In addition to the open source offering, The HDF Group has designed several paid product services to make the implementing and using HSDS easier for enterprise customers. These services are offered under the Kita umbrella:  

  • Kita Lab, a JupyterLab SaaS enabled data exploration tool 
  • Kita Server for Amazon Marketplace, a turnkey marketplace product for those storing data on S3 
  • Kita Custom Server, consulting and support services to create bespoke solutions for date stored on prem, in the cloud, or a combination of the two

Learn more about the HDF Group’s product behind NREL’s public data release in this interview with John Readey, the principal architect behind the new service

Q: NREL’s data release is 50 TB. That’s big, but not that big. Tell me what you think is possible for HSDS?

A: Correct. NREL’s project was for this 50 TB wind data file, but that’s really pretty modest in terms of what’s out there and what HSDS can handle. The point of HSDS is making the data in these big files accessible, whether they are stored on the cloud or on premise somewhere.  The advantage of using HSDS is that you can work with the data “in-place” – that is without having to move or transform the data before doing analytics.  Also worth noting is that via HSDS, the 50 TB is visible as one “file”.  Typically for collections of this size the data would be setup as many Posix files and the application would be required to juggle data from different files to fetch the results it needed.  

Q: It can handle big files, but what about speed?  

A: So that’s the thing that makes HSDS special.  Internally HDF5 datasets are organized as “chunks” – equal sized tiles that sub-divide the dataset space.  Traditionally with the HDF5 library, when you need to select a slice of data from a dataset each chunk that is accessed has to be processed sequentially.  With HSDS, these operations can be handled parallelly, greatly speeding up read and write operations. 

Further, if you need to look at data in that same slice again, it’s cached. It’s there for you to access without pulling it across the network again.  

You’re getting your data faster, and if you’re in an environment where you’re paying for that transfer like in egress charges, you’re saving money.  

Q: You mention HSDS being used in an on premise installation or installed on a commercial cloud where there would be egress charges. Where can people use HSDS? 

 A: HSDS can be used on prem or it can be added to an AWS account either through the marketplace with an hourly charge or installed to work with data you have stored on S3. We’re also working to implement this in Azure. 

Kita comes into play as managed services from The HDF Group around HSDS. In our experience consulting, we find one of the biggest ways we could improve the use of HDF5 was to be there at the beginning, helping with the setup. For organizations that want to expedite this process, and make sure all the decisions made in the beginning help give them the fastest and most efficient system, consulting via a Kita Server purchase may be for them.  

Q: What does this product mean for The HDF Group? 

A: Kita product services are an extension of what The HDF Group has always done and is part of their mission. It’s about continuing the accessibility of HDF-stored data, as that data moves into cloud storage or wherever. Kita Server on Amazon Marketplace can be a quick install for a researcher or a small group to grab and use for quick access of their data. And finally, Kita Lab is a convenient exploration and collaboration tool that might have big benefits for users for a minimal cost, and that’s after the free trial. We’re also looking at how we might offer Kita Lab free to certain populations, like students, or with sponsorship, just make it free for everyone. 

Free trial offer

Everyone can experience the power of Kita with a free 30-day trial of Kita Lab, the JupyterLab enabled data exploration tool. 

If you’re ready to start a conversation with sales engineer about how Kita Server might work for you, please contact us