Hazard Emulator#

Given a database of hazard events, the module climada.hazard.emulator provides tools to subsample events (or time series of events) from that event database. The module provides functionality to guard the subsampling, e.g., using bias-corrected statistics according to historical records in a specific georegion, or using calibrated statistics according to a climate scenario.

In more complex cases, the given event database is divided into a (smaller) set of observed hazard events and a (much larger) set of simulated hazard events. The database of observed events is used to statistically fit the frequency and intensity of events in a fixed georegion to (observed) climate indices. Then, given a hypothetical (future) time series of these climate indices (a “climate scenario”), a “hazard emulator” can draw random samples from the larger database of simulated hazard events that mimic the expected occurrence of events under the given climate scenario in the specified georegion.

The concept and algorithm as applied to tropical cyclones is originally due to Tobias Geiger (unpublished as of now) and has been generalized within this package by Thomas Vogt.

This notebook illustrates the functionality through the example of tropical cyclones in the eastern pacific under the RCP 2.6 climate scenario according to the MIROC5 global circulation model (GCM). However, the algorithm can be applied to arbitrary Hazard types given a suitable database of synthetic events.

Load hazard data#

The database of hazard events for this tutorial is too large to be provided with it. Instructions on how to obtain or generate the data are provided in the last section of this tutorial. At this point, we load a precomputed Hazard object from a file that contains simulated tropical cyclone tracks for an RCP 2.6 climate scenario according to MIROC5. Note, however, that this module will work with any Hazard object, preferably containing a lot of events.

from climada.util.config import CONFIG
DEMO_DIR = CONFIG.local_data.demo.dir(create=False)
EMULATOR_DATA_DIR = DEMO_DIR.joinpath("emulator")

from climada.hazard import TropCyclone
hazard = TropCyclone.from_hdf5(EMULATOR_DATA_DIR.joinpath("hazard_360as_miroc_rcp26.hdf5"))

2021-03-24 17:46:31,484 - climada.hazard.base - INFO - Reading /home/tovogt/.climada/demo/data/emulator/hazard_360as_miroc_rcp26.hdf5

We quickly try to give an impression of what this particular Hazard object looks like. It contains hazard.size = 45300 tropical cyclone wind fields that are computed on a global set of centroids with 360 arc-seconds (0.1 degree) resolution onland and 1 degree resolution offshore. The physical effects onland are considered more important, the low-resolution information offshore is mostly relevant for plotting purposes:

import warnings; warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
plt.rcParams['figure.dpi'] = 120
hazard.centroids.to_crs("EPSG:4326", inplace=True)
hazard.centroids.plot(c=hazard.intensity[16637,:].toarray().ravel(), s=0.1);

<cartopy.mpl.geoaxes.GeoAxesSubplot at 0x7f74c8bb0a60>

../_images/a131b99ef624b41dde0640c4aa0f39f45859c475458df59e7d49a9aabd7d759e.png

Define the region of interest#

One basic feature of the climada.hazard.emulator module is the HazRegion class that not only defines the geographical region of interest for the statistics, but might also contain climatic information about the selected region, such as the cyclone (hurricane) season. For tropical cyclone hazards there already exists a derived class TCRegion that defines ocean basins and cyclone seasons.

import cartopy.crs as ccrs
import cartopy.feature as cfeature
from climada_petals.hazard.emulator.geo import TCRegion
# load Eastern Pacific basin, print season (months of year) and plot geometry
region = TCRegion(tc_basin="EP")
print(f"\nThe cyclone season in basin {region.tc_basin} spans "
      f"from month {region.season[0]} until month {region.season[1]}.\n")

ax = plt.gcf().add_axes([0, 0, 1, 1], projection=ccrs.PlateCarree())
ax.add_feature(cfeature.COASTLINE.with_scale('110m'), linewidth=0.75)
ax.add_geometries([region.shape], crs=ccrs.PlateCarree(), alpha=0.8);

The cyclone season in basin EP spans from month 7 until month 12.

<cartopy.mpl.feature_artist.FeatureArtist at 0x7f74985581f0>

../_images/bfed0a0f68ba29be55aeafd1a6c36a91af32f10082aa63d11f1308d3bb7718aa.png

For our analysis, we restrict to the hazard events that affect the land area (defined by the on_land property) in this region:

import shapely
hazard.centroids.gdf['region_id'] = shapely.vectorized.contains(
    region.shape, hazard.centroids.lon, hazard.centroids.lat) & hazard.centroids.gdf.on_land.astype(bool)
hazard_EP = hazard.select(reg_id=1)
# Plot one event as an example:
hazard_EP.centroids.plot(c=hazard_EP.intensity[31577,:].toarray().ravel(), s=3);

<cartopy.mpl.geoaxes.GeoAxesSubplot at 0x7f7498568ee0>

../_images/e041839a9342d2a0325c8fb9047269febc8cdd7e7466507d7506088eb5a51fd9.png

Extract events that affect the region of interest#

The emulator’s statistics and sampling functionality doesn’t use the Hazard object directly, but an aggregated version of it (a pandas DataFrame). From the Hazard objects, we extract those events that actually “affect” the georegion of interest and store for each the maximum intensity observed within the region (as well as date and location of occurrence):

from climada_petals.hazard.emulator.stats import haz_max_events

# for this example, we regard grid cells as `affected` if they face at least 34 knots wind speeds
KNOTS_2_MS = 0.514444
MIN_WIND_MS = 34 * KNOTS_2_MS

max_events_base = haz_max_events(hazard_EP, min_thresh=MIN_WIND_MS)
max_events_base

2021-03-24 17:46:52,405 - climada.hazard.emulator.stats - INFO - Condensing 45300 hazards to 7401 max events ...

	id	name	year	month	day	lat	lon	intensity
0	1	1	1950	8	17	25.1	-112.3	42.654243
1	5	5	1950	9	3	18.7	-104.0	36.720880
2	8	8	1950	9	17	27.2	-114.0	32.730017
3	13	13	1950	11	24	17.2	-100.6	42.861889
4	17	17	1950	10	26	22.9	-106.3	47.887202
...	...	...	...	...	...	...	...	...
7396	45260	28460	2100	11	21	24.1	-107.0	29.222734
7397	45261	28461	2100	10	25	27.4	-114.2	34.721709
7398	45274	28474	2100	10	20	18.1	-102.1	89.034839
7399	45284	28484	2100	10	9	17.8	-101.5	59.966329
7400	45298	28498	2100	10	13	24.2	-107.5	31.401268

7401 rows × 8 columns

Making draws from the event pool#

The most basic functionality of the emulator module is then to subsample from this event pool according to a desired frequency and intensity without taking the date of events into account at all:

from climada_petals.hazard.emulator.emulator import EventPool
event_pool = EventPool(max_events_base)
draws = event_pool.draw_realizations(nrealizations=10, freq_poisson=10, intensity_mean=30, intensity_std=5)

The result draws is a list of nrealizations subsamples from tc_events_pool. The number of events in each sample DataFrame is driven by a Poisson distribution (lambda = 10) and each DataFrame’s mean intensity is 30±5:

assert len(draws) == 10
display(draws[0])
print("Number of events in each sample:", [d.shape[0] for d in draws])
print("Mean intensity of each sample:", [d['intensity'].mean() for d in draws])

	id	name	year	month	day	lat	lon	intensity
754	5285	5285	1967	6	23	18.3	-103.3	30.302091
3458	22096	5296	2023	10	8	15.7	-93.7	47.669055
2357	15416	15416	2001	8	19	22.8	-110.1	30.646163
1329	8892	8892	1979	5	6	19.2	-104.8	39.077807
4153	26512	9712	2038	8	17	31.3	-116.4	18.318382
2131	13949	13949	1996	11	6	18.6	-103.4	34.297652
89	635	635	1952	6	14	21.3	-106.5	51.698398
5106	32250	15450	2057	9	8	21.4	-106.6	37.932184
4703	29808	13008	2049	11	1	59.8	-162.9	20.645396
4844	30510	13710	2051	9	22	26.2	-112.6	26.817070

Number of events in each sample: [10, 7, 11, 7, 10, 13, 11, 16, 12, 6]
Mean intensity of each sample: [33.74041981432531, 32.36753130917463, 34.59815875000344, 31.5348526999235, 33.67298356212818, 27.126272794400833, 34.588544482580225, 32.567385754309875, 33.97376593673842, 33.712914894777875]

Bias-corrected statistics#

Now, as noted above, our database actually attaches date information to the events:

import datetime as dt
minyear, maxyear = [dt.datetime.fromordinal(d) for d in [hazard.date.min(), hazard.date.max()]]
print(f"The hazard event database covers the period between {minyear} and {maxyear}.")

The hazard event database covers the period between 1950-01-03 00:00:00 and 2100-12-29 00:00:00.

However, the database contains much more events per year than the underlying physics would suggest. That’s why the creator of the global database provides information about frequency, from which we created the following CSV-data (see last section for more information about the data source and pre-processing steps taken):

import pandas as pd
frequency = pd.read_csv(EMULATOR_DATA_DIR.joinpath("freq_miroc_rcp26.csv"))
with pd.option_context("display.max_rows", 5):
    display(frequency)

	year	freq
0	1950	0.299428
1	1951	0.289091
...	...	...
149	2099	0.348597
150	2100	0.354156

151 rows × 2 columns

The freq column encodes the relative surplus of events for each year: A freq value of 0.29 means that the expected number of events for a particular year is 29, while the actual number of events in the database for that year is 100. Depending on your data provider, it might be more or less simple to derive this kind of information for your event database.

The database might be subject to regional biases: While the whole global event database might represent global statistics of certain physical properties very well, it can still systematically under- or overestimating regionally aggregated properties. One way to “bias-correct” this kind of effect is by comparing with historical records. That’s where a second event database comes in, a database of observed hazard events:

observed = TropCyclone.from_hdf5(EMULATOR_DATA_DIR.joinpath("hazard_360as_ibtracs_1950-2019.hdf5"))
observed.centroids.gdf['region_id'] = shapely.vectorized.contains(
    region.shape, observed.centroids.lon, observed.centroids.lat) & observed.centroids.gdf.on_land.astype(bool)
observed_EP = observed.select(reg_id=1)

2021-03-24 17:46:52,543 - climada.hazard.base - INFO - Reading /home/tovogt/.climada/demo/data/emulator/hazard_360as_ibtracs_1950-2019.hdf5

Since the quality of observed data is known to vary a lot in different world regions, we restrict our dataset to an appropriate norm period. Again we condense the hazard object to a DataFrame.

from climada_petals.hazard.emulator.const import TC_BASIN_NORM_PERIOD
norm_period = TC_BASIN_NORM_PERIOD[region.tc_basin[:2]]
observed_EP = observed_EP.select(date=(f"{norm_period[0]}-01-01", f"{norm_period[1]}-12-31"))
max_events_observed = haz_max_events(observed_EP, min_thresh=MIN_WIND_MS)

2021-03-24 17:46:53,013 - climada.hazard.emulator.stats - INFO - Condensing 7335 hazards to 377 max events ...

We would now have all the data for bias correction, but we don’t have to do this manually. The emulator module takes care of this.

Initialize and calibrate the hazard emulator#

The most complex part of the emulator module is the HazardEmulator class that automatically applies bias-correction and provides functionality to make draws according to corrected statistics. In the next section, we will see that it can even make draws according to a climate scenario.

from climada_petals.hazard.emulator.emulator import HazardEmulator
em = HazardEmulator(max_events_base, max_events_observed, region, frequency, pool=event_pool)

2021-03-24 17:46:53,188 - climada.hazard.emulator.random - INFO - Results of intensity normalization by subsampling:
2021-03-24 17:46:53,189 - climada.hazard.emulator.random - INFO - - drop 35% of entries satisfying 'intensity > 37.60902402823995'
2021-03-24 17:46:53,189 - climada.hazard.emulator.random - INFO - - mean intensity of simulated events before dropping is 37.6090
2021-03-24 17:46:53,190 - climada.hazard.emulator.random - INFO - - mean intensity of simulated events after dropping is 35.4003
2021-03-24 17:46:53,190 - climada.hazard.emulator.random - INFO - - mean intensity of observed events is 34.6826

All corrections discussed above have already been applied upon initialization and aggregated in the stats attribute (it’s mainly for internal use, so don’t worry if you don’t understand the meaning of all the columns):

em.stats

	year	eventcount	intensity_mean	intensity_std	intensity_max	eventcount_obs	intensity_mean_obs	intensity_std_obs	intensity_max_obs
0	1950	5.041561	37.560036	10.921193	56.148643	3.0	32.444994	4.193251	38.302107
1	1951	4.056249	32.092965	12.225921	61.743472	6.0	21.487378	2.879905	25.614798
2	1952	5.830183	37.488452	15.000097	60.163892	NaN	NaN	NaN	NaN
3	1953	2.185263	33.317693	9.302551	41.871618	3.0	26.853119	2.943347	30.412018
4	1954	3.123722	30.061603	9.982461	37.396814	8.0	29.232035	8.669075	47.533993
...	...	...	...	...	...	...	...	...	...
146	2096	5.641057	39.148745	12.819278	50.586874	NaN	NaN	NaN	NaN
147	2097	7.649071	35.221123	12.889677	55.557459	NaN	NaN	NaN	NaN
148	2098	4.050403	33.088812	12.990415	55.963069	NaN	NaN	NaN	NaN
149	2099	5.217263	34.029102	14.014108	52.617260	NaN	NaN	NaN	NaN
150	2100	7.288148	33.401478	17.953031	68.139491	NaN	NaN	NaN	NaN

151 rows × 9 columns

We can now draw ensembles of events that adhere to the bias-corrected statistics:

# the meaning of `predict_statistics` will be explained in the next section
em.predict_statistics()
draws = em.draw_realizations(10, (2030, 2050))

2021-03-24 17:46:53,222 - climada.hazard.emulator.emulator - INFO - Predicting statistics without climate index predictor...
2021-03-24 17:46:53,226 - climada.hazard.emulator.emulator - INFO - Drawing 10 realizations for period  (2030, 2050)
2030 ... 2050 ... 2050

The returned object draws is a DataFrame with each row corresponding to a storm event from the hazard pool hazard_EP (see above): The column real_id assigns one of 100 realizations to each of the events while the columns id and name are the unique ID and name used in hazard_EP to identify this hazard event. The column year indicates the year in which the event would occur under the hypothetical corrected statistics.

display(draws[:25:2])

	id	name	year	real_id
0	5002	5002	2030	0
2	28485	11685	2030	0
4	8724	8724	2030	2
6	17148	348	2030	2
8	10339	10339	2030	3
10	7026	7026	2030	3
12	27333	10533	2030	5
14	1523	1523	2030	6
16	12183	12183	2030	7
18	20406	3606	2030	8
20	12394	12394	2030	9
22	37101	20301	2031	0
24	17358	558	2031	0

Draw samples according to climate scenario#

The emulator can also be used to sample hypothetical events within an arbitrary time period covered by one or several climate index time series:

climate_indices = [pd.read_csv(EMULATOR_DATA_DIR.joinpath("gmt_miroc_rcp26.csv")),
                   pd.read_csv(EMULATOR_DATA_DIR.joinpath("esoi_miroc_rcp26.csv"))]
with pd.option_context("display.max_rows", 5):
    display(*[df for df in climate_indices])

	gmt	year	month
0	0.040892	1861	7
1	0.053463	1862	7
...	...	...	...
437	1.397416	2298	7
438	1.375225	2299	7

439 rows × 3 columns

	esoi	year	month
0	-0.316829	1861	1
1	-0.479121	1861	2
...	...	...	...
5266	-0.569960	2299	11
5267	-0.523571	2299	12

5268 rows × 3 columns

An arbitrary number of climate indices can be provided in a list, each as a DataFrame with a year column and an additional column (in this example, gmt or esoi, respectively) containing the actual climate index data. Optionally, each climate index DataFrame can be at monthly resolution (like in the esoi example), as indicated by an additional month column. A constant month column (like in the gmt example above) will be discarded automatically by the emulator.

Using those climate index time series, we calibrate the emulator, i.e., we determine a statistical connection between climate indices (GMT and ENSO in this example) and tc_events_pool:

em.calibrate_statistics(climate_indices)

Now that the emulator is calibrated, we use GMT and ENSO time series to predict TC statistics under the chosen climate scenario:

em.predict_statistics(climate_indices)

2021-03-24 17:46:54,572 - climada.hazard.emulator.emulator - INFO - Predicting statistics with new climate index time series...

The predicted statistics are stored in the stats_pred attribute of the emulator:

em.stats_pred

	year	gmt	esoi	intensity_mean	intensity_mean_residuals	eventcount	eventcount_residuals
0	1861	0.040892	0.170012	34.803000	2.637305	4.176576	1.272837
1	1862	0.053463	-0.813848	34.038959	2.637305	4.406238	1.272837
2	1863	0.063511	1.040317	35.478856	2.637305	4.018993	1.272837
3	1864	0.061416	1.441388	35.790317	2.637305	3.935384	1.272837
4	1865	0.059797	1.129038	35.547754	2.637305	3.996921	1.272837
...	...	...	...	...	...	...	...
434	2295	1.395940	-0.726180	34.107040	2.637305	6.248324	1.272837
435	2296	1.407416	0.648489	35.174572	2.637305	5.842586	1.272837
436	2297	1.410368	0.769933	35.268883	2.637305	5.811002	1.272837
437	2298	1.397416	0.870595	35.347054	2.637305	5.761524	1.272837
438	2299	1.375225	-0.526354	34.262219	2.637305	6.151251	1.272837

439 rows × 7 columns

Since the climate index time series covers a larger range than our hazard database (1861-2299 vs. 1950-2100), we can even predict the statistics in years that have not been covered by the hazard database.

draws = em.draw_realizations(10, (2110, 2130))
display(draws[:25:2])

2021-03-24 17:46:54,603 - climada.hazard.emulator.emulator - INFO - Drawing 10 realizations for period  (2110, 2130)
2110 ... 2130 ... 2130

	id	name	year	real_id
0	13973	13973	2110	0
2	24253	7453	2110	0
4	14891	14891	2110	0
6	21909	5109	2110	0
8	309	309	2110	1
10	27897	11097	2110	1
12	38751	21951	2110	1
14	31890	15090	2110	1
16	13222	13222	2110	2
18	14662	14662	2110	2
20	9915	9915	2110	2
22	26466	9666	2110	3
24	1523	1523	2110	3

Create sample Hazard object from draws#

Using the DataFrame draws, we can produce a hazard object that contains the sampled events from hazard_EP:

hazard_sample = hazard_EP.select(event_names=draws['name'].tolist())

For this event to represent the sample, we need to adjust the date (and, optionally, the frequency according to the number of realizations):

years = [dt.datetime.fromordinal(d).year for d in hazard_sample.date]
hazard_sample.date += [dt.datetime(y_dst, 1, 1).toordinal() - dt.datetime(y, 1, 1).toordinal()
                       for y_dst, y in zip(draws['year'].values, years)]

dates = [dt.datetime.fromordinal(d) for d in hazard_sample.date]
display(dates[:5])
display(dates[-5:])

[datetime.datetime(2110, 7, 26, 0, 0),
 datetime.datetime(2110, 12, 11, 0, 0),
 datetime.datetime(2110, 1, 14, 0, 0),
 datetime.datetime(2110, 7, 26, 0, 0),
 datetime.datetime(2110, 8, 29, 0, 0)]

[datetime.datetime(2130, 7, 29, 0, 0),
 datetime.datetime(2130, 5, 28, 0, 0),
 datetime.datetime(2130, 6, 2, 0, 0),
 datetime.datetime(2130, 9, 13, 0, 0),
 datetime.datetime(2130, 4, 3, 0, 0)]

For the resulting TropCyclone object to be valid, the event names have to be unique. Since our subsampler makes draws for each year and realization independent of any other year and realization, events might occur in more than one year or realization. One way of dealing with this, is the following renaming step:

hazard_EP.event_names = [f"{row.name}-{row.year}-{row.real_id}"
                         for index, row in draws.iterrows()]
hazard_EP.check()

About the input data used for this notebook#

Since a crucial ingredient for this module is a (sufficiently large) database of hazard events, executing this tutorial notebook requires external data that is not provided with the official CLIMADA repository. The data is expected to be located in the subdirectory {CONFIG.local_data.demo}/emulator/ according to the CLIMADA config (called EMULATOR_DATA_DIR above). Since parts of the data are under a proprietary license, the data is only available upon request from Thomas Vogt (PIK). In the following, you find a list of all the files used in this notebook with information about sources and pre-processing steps taken:

hazard_360as_miroc_rcp26.hdf5: A TropCyclone object (stored as 700 MB hdf5-file) that represents the simulated hazard event database. It has been generated from simulated TC tracks provided by Kerry Emanuel for ISIMIP (version 2b). However, since the track data comes with a proprietary license, they are not listed in the official ISIMIP repositories. Tracks for the period 1950-2100 according to the MIROC5 GCM simulations of the RCP 2.6 scenario have been loaded into CLIMADA using the TCTracks.from_simulations_emanuel constructor. The wind fields have been computed from the tracks using CLIMADA’s TropCyclone.from_tracks function and a global set of centroids with 360 arc-seconds (0.1 degree) resolution onland and 1 degree resolution offshore (plotted above).
hazard_360as_ibtracs_1950-2019.hdf5: A TropCyclone object (stored as 95 MB hdf5-file) that represents the observed hazard event database. It has been generated from records in the IBTrACS database for the years 1950-2019. The tracks have been loaded into CLIMADA using the TCTracks.from_ibtracs_netcdf constructor. Then, wind fields have been computed as for the simulated tracks (see hazard_360as_miroc_rcp26.hdf5).
freq_miroc_rcp26.csv: A table with year and freq column covering the years 1950-2100. The freq values have been obtained by dividing the freqyear field contained in the simulated TC tracks (see hazard_360as_miroc_rcp26.hdf5) by 300 (the total number of global tracks per year in the simulated TC track files).
gmt_miroc_rcp26.csv: A table with gmt, year and month field (the month is constantly set to 7). GMT stands for “Global Mean (surface) Temperature”. The CMIP5 monthly mean atmopsheric (Amon) tas temperature field of MIROC5 RCP 2.6 simulations covering the years 1861-2299 has been averaged globally and annually using the data processing tool CDO. The data is relative to the mean over the 500 year pre-industrial control run of the GCM and the time series has been smoothed by applying a 21-year running mean.
esoi_miroc_rcp26.csv: A table with esoi, year and month field. ESOI stands for “Equatorial Southern Oscillation Index”. Instructions on how to compute this index from air pressure can be found on the ENSO Monitoring website, hosted by Columbia University. For the air pressure input, we extracted the CMIP5 monthly mean atmopsheric (Amon) psl field of MIROC5 RCP 2.6 simulations covering the years 1861-2299.

Hazard Emulator

Contents