Google Cloud is now hosting several climate-related datasets in formats easily accessible to AI researchers and ML engineers.
The first dataset is the Coupled Model Intercomparison Project Phase 6 (CMIP6) data archive by the World Climate Research Programme, “aggregating the climate models created across approximately 30 working groups and 1,000 researchers investigating the urgent environmental problem of climate change.” CMIP6 includes historical data, models, high-resolution simulations of rare events covering…
[…] everything from forest transpiration in the Amazon rainforest and thunderstorms in the U.S. Midwest to the formation of meltwater ponds on Arctic sea ice. […] On Google Cloud, this dataset will be continuously updated and available to researchers around the globe to use for their own projects—without the constraints of downloading terabytes or even petabytes of data.
The second dataset comes from the U.S. National Oceanic and Atmospheric Administration (NOAA), in the form of 5 petabytes of data including…
[…] real-time satellite imagery, more than 20 years’ worth of the National Water Model, historic storm event data, aggregated lighting strike data, precipitation data back to the 1700s, and data on shipping patterns dating back to the 1600s [and more].
The data will be available across Google products such as Cloud Storage and Kaggle. This makes them easily accessible to ML/AI researchers and engineers because they fit into our existing workflows, from competing in Kaggle competitions to pulling data from Google Cloud into our models using frameworks e.g. TensorFlow Datasets. Check out this second post on the Google Cloud blog by Shane Glass for examples of how the data could be used in ML models, including early wildfire detection and real-time disaster information services: Big data, big world: new NOAA datasets available on Google Cloud