NOTE: Rainyday is currently undergoing major refactoring. Much information in this User’s Guide is not updated. If you wish to use the code, please contact Daniel Wright (danielb.wright@wisc.edu) for current guidance.
Table of contents¶
- Introduction to the RainyDay User’s Guide
- Getting Started with RainyDay
- How SST works in RainyDay
- Structure of the ‘.sst’ File
- RainyDay Features
- Case Study 1: Single Gridcell Rainfall Frequency Analysis in Madison, Wisconsin
- Case Study 2: Watershed-Scale Rainfall Frequency Analysis for Big Thompson Watershed, Colorado
- Appendix A: A General Description of SST
- Appendix B: Precipitation Data Sources
- Appendix C: Creating an ‘Irregular’ Transposition Domain Shapefile
- Appendix D: RainyDay Input and Output NetCDF4 Files
- Appendix E: References
Introduction to the RainyDay User’s Guide¶
What is RainyDay? ¶
RainyDay is an open-source Python-based framework for generating large numbers of realistic extreme rainfall scenarios using gridded precipitation data combined with a technique known as stochastic storm transposition (SST). These rainfall scenarios can be used to examine the extreme rainfall statistics for a user-specified region such as a watershed or a single rainfall “pixel,” or to drive a rainfall-runoff model to analyze flood frequency. RainyDay was created to make SST–and more generally, rainfall and flood frequency analysis using gridded precipitation data–more accessible.
All the necessary Python libraries are supported within recent versions of the Anaconda Python distribution from Continuum Analytics. The code is currently not parallelized, though certain parts are “accelerated” using just in time compilation using Numba. Computational time is determined mainly by the size of the input dataset–record length, input resolution, and the geographic size of the analysis–though other factors can also impact runtime. Computational speed, even without parallelization, is not prohibitive on a modern personal computer, and can range from several seconds to several hours for typical configurations and input datasets.
Once Python and all necessary libraries are installed, no further knowledge of Python is required. Specification of geographic areas, input precipitation data, number of storms to be analyzed, etc., are all defined within an ‘.sst’ file, which is a text file consisting of a large number of user-defined fields.
References and Learning Tools ¶
This User’s Guide ¶
This guide is the most comprehensive place for information on how to perform SST analyses using RainyDay. It includes instructions to install RainyDay, explanation of how SST works in RainyDay, explanations of the user-defined fields in the ‘.sst’ file, and futher details on key RainyDay features. Two detailed case studies highlight RainyDay’s/SST’s features and some of its pitfalls. The first of these case studies demonstrates “pixel-scale” rainfall frequency estimation up to the 500-year return period for Madison, Wisconsin using a radar-based precipitation dataset. The second case study demonstrates watershed-scale rainfall frequency estimation up to the 10,000-year return period for the Big Thompson watershed in the front range of the Rocky Mountains in Colorado using a primarily raingage-based precipitation dataset.
This guide also includes several appendices:
- Appendix A: A General Description of SST: This provides an overview of the SST methodology (not specific to RainyDay). This is recommended reading for those not already familiar with basic SST principles.
- Appendix B: Precipitation Data Sources: This provides a description of available datasets that have been formatted for usage in RainyDay and are available for download.
- Appendix C: Creating ‘Irregular’ Shapefiles: This explains how to create shapefiles for transposition domains and analysis control volumes (e.g. watersheds, arbitrary shapes). This explanation does not require GIS software or expertise.
- Appendix D: RainyDay-compatible NetCDF4 File Structure: Intended for advanced users who wish to understand details of the input and output file structures needed for RainyDay.
- Appendix E: References: This provides a reference list for all studies cited in this guide, as well as others that are relevant.
Once you have learned the basics of how SST is implemented in RainyDay, you should take time–both by carefully reading the ‘.sst’ file format section and looking through the Madison, Wisconsin and Big Thompson, Colorado case studies–so that you can understand how to choose the right options for your analysis.
Core Papers¶
Given the degree of flexibility that RainyDay offers, it is very easy to make conceptual mistakes that will produce unrealistic results. To avoid this, it is essential that the user understand basic principles of SST in order to properly use RainyDay. The best way to learn these principles is to read the following two papers, though large parts of these papers have been adapted into this guide:
- Wright, D.B., R. Mantilla, and C.D. Peters-Lidard. A remote sensing-based tool for assessing rainfall-driven hazards, Environmental Modelling & Software, 2017.
- Wright, Daniel B., Guo Yu, and John F. England. Six Decades of Rainfall and Flood Frequency Analysis Using Stochastic Storm Transposition: Review, Progress, and Prospects Journal of Hydrology, 2020.
Wright et al. (2017) provides a detailed description of RainyDay and shows a variety of results. Wright et al. (2020) provides a broader review of SST, including a more general explanation of the method, a review of different approaches and applications it has been used for, and perspectives on limitations and future directions.
Other Relevant Papers ¶
Additional useful and/or interesting applications of SST include:
- Yu, G., D.B. Wright, C. Smith, Z. Zhu, K. Holman, Process-Based Flood Frequency Analysis in a Changing Agricultural Watershed, Hydrologic and Earth Systems Science, 2019.
- Yu, Guo, Daniel B. Wright, and Zhe Li. The Upper Tail of Precipitation in Convection-Permitting Regional Climate Models and Their Utility in Nonstationary Rainfall and Flood Frequency Analysis. Earth’s Future, 2020.
- Yu, Guo, Daniel B. Wright, and Kathleen D. Holman. Connecting Hydrometeorological Processes to Low-Probability Floods in the Mountainous Colorado Front Range. Water Resources Research, 2021.
See Appendix E: References for a more complete list of papers that have used RainyDay.
RainyDay License Agreement ¶
Recent versions of RainyDay are distributed under the MIT Open Source License.
About the Author and Contact Information ¶
RainyDay was written by Dr. Daniel Wright at University of Wisconsin-Madison. If you have questions not answered by this tutorial, or if you have an interesting application with a specific requirement, you may contact Dr. Wright at danielb.wright@wisc.edu.
Getting Started with RainyDay ¶
Download RainyDay Code ¶
You can download the RainyDay sourcecode from the RainyDay GitHub repository. There are multiple ways to do this. Assuming you have Git installed on your machine, you can use the terminal command line (at least in Linux, MacOS, and Windows Cygwin):
git clone https: //github.com/danielbwright/RainyDay2.git
Alternatively, you can visit the RainyDay GitHub repository via a web browser and download a zip file of the sourcecode and examples by clicking on the green “Code” button and then choosing “Download ZIP”:
Install Anaconda ¶
It is recommended that you use the Anaconda Python distribution and environments from Continuum Analytics. The current version of RainyDay requires Python 3. If you are new to Python and Anaconda, and if you are using a personal computer, the easiest way to proceed is to install Anaconda using the graphical installer. If you are using a remote server, you can download and install Anaconda from the command line.
This tutorial will not explain what Anaconda is or how to do things with it. If you have never used Anaconda before, there are many resources online, such as this one.
Create a ‘clean’ Anaconda environment ¶
It is recommended that you create a new, dedicated Anaconda environment specifically for running RainyDay. Detailed information on managing Anaconda environments can be found here. However, this tutorial tries to convey everything needed to set up your RainyDay environment. Specifically, there are two options for doing this: 1) creating an environment from the .yml file included in the RainyDay GitHub repository, or 2) creating your own environment.
Option 1: Create environment from .yml file ¶
The RainyDay code on GitHub should include a file called “RainyDay_Env.yml”. From the command line, you can use the following commands to attempt to create a new environment from this .yml file:
conda env create -f RainyDay_Env.yml
This doesn’t always work, however, particularly when attempting to it using an operating system other than the MacOS, which was used to create the .yml file. If this doesn’t work, move on to Option 2.
Option 2: Create new environment ¶
If Option 1 didn’t work, move on to option 2. From the command line, use the following commands:
conda create –name RainyDay_Env
conda activate RainyDay_Env
conda install geopandas # very slow. Like, veeeerrrryyy sllloooowww!
conda install pyshp
conda install cartopy
conda install netCDF4
conda install rasterio
conda install numba
conda install xarray
conda install shapely
Hopefully you don’t encounter any problems!’
Update ‘PYTHONPATH’ ¶
RainyDay introduces a range of functions, all contained with the file ‘RainyDay_functions.py’. Python needs to know where to “find” these functions, and the way it does this is via a system-level environmental variable known as PYTHONPATH. How you set PYTHONPATH varies a bit depending on your system. Instructions can be found here.
Check to see if RainyDay works! ¶
Now, try to run Rainyday from the command line. Note that you will need to either 1) navigate to the directory in which the RainyDay code resides or 2) provide the full path to the RainyDay code in order for this to work. An example of the second option could look like this:
python {some path}/Source/RainyDay_Py3.py
If you received a message with some RainyDay license information following by ‘You didn’t specify an input (‘.sst’) file!’, then RainyDay is properly configured and ready to use. If something else happens, try to repeat the steps above. If all else fails, contact the author.
How SST works in RainyDay ¶
Though the way that RainyDay performs SST is consistent with the more general description above, more specific explanations are needed to truly understand what the software does. The following explanation is adapted from Wright et al. (2017). The following explains what RainyDay’s “computational core” does; detailed options and features are discussed later in descriptions of the .sst file and additional high-level features. The following diagram shows how the RainyDay code functions:
Figure: Flow chart of the RainyDay workflow. The topmost box (“User launches …”) pertains to every instance of RainyDay. Depending on user-defined choices from the .sst file, the user will either create a storm catalog, perform a frequency analysis, or both. By modifying the ‘.sst’ file, the user can choose to perform either one or both the storm catalog creation and SST frequency analysis steps in a single run or in separate runs of RainyDay. This is advantageous in several ways. For example, a user can generate and evaluate a storm catalog before moving on to frequency analysis, or, if the user has already generated a storm catalog, it can be re-used, avoiding the time-consuming process to create and evaluate a new storm catalog. To reuse a storm catalog, the new analysis must share the same transposition domain with the existing storm catalog and have a duration less than or equal to the duration of the storms in the catalog.
The following five steps describe how SST is implemented in RainyDay:
- Identify a geographic transposition domain AD that encompasses the area of interest Aw. Note that in RainyDay, Aw can be a single precipitation grid cell, a rectangular area containing multiple grid cells, or a contiguous area defined by a user-supplied polygon in GIS shapefile format, typically the outline of a watershed boundary. One could confine AD to regions with homogeneous extreme rainfall properties, (e.g. flat areas far from large water bodies and topographic features). RainyDay offers several diagnostic aids that help the user to understand rainfall heterogeneity over the region AD and to improve the performance of the SST procedure in cases where rainfall heterogeneities do exist. It should be noted, however, that selection of AD is probably the single biggest “question mark” in most or all SST studies, and thus far, rigorous guidance on best practices for doing so is lacking. In RainyDay, AD is defined via the DOMAINTYPE and DOMAINSHP fields within the ‘.sst’ file. The reader is directed to the description of the .sst file for more details.
- Identify the largest m temporally non-overlapping storms in AD from an n-year rainfall remote sensing dataset, in terms of rainfall accumulation of duration t and with the same size, shape, and orientation of AD. For example, the principal axis of the Turkey River watershed in northeastern Iowa in the central United States is oriented roughly northwest-southeast and has an area of 4,400 km2. In this case, the m storms are those associated with the m highest t-hour rainfall accumulations over an area of 4,400 km2 with the same size, shape, and orientation as the Turkey River watershed. We refer to this set of storms henceforth as a “storm catalog,” with the same geographic extent as A’ and the same spatial and temporal resolution as the input rainfall data. We refer to the m storms in the storm catalog henceforth as “parent storms.” In RainyDay, the user can specify whether to exclude certain months (such as wintertime) from the storm catalog. Previous studies have shown that there can be low bias introduced in high-exceedance probability (i.e. frequent, low-intensity) events if m is small (e.g. Foufoula-Georgiou, 1989; Franchini et al., 1996; Wilson and Foufoula-Georgiou, 1990; see Wright et al., 2013 for a discussion). The sensitivity of SST results to the choice of m and AD is explored in Wright et al. (2017), but 10 to 20 times n generally minimizes the low bias for frequent events and would likely be a good starting point for new analyses. Low exceedance probability (i.e. rare) events are less sensitive to the choice of m. Examples of this sensitivity are shown in this guide’s Case Studies. In RainyDay, duration t is a user-defined input, and if t is neither very short nor very long relative to the time scale of hazard response in A, subsequent hazard modeling results will be relatively insensitive to the chosen value. In this respect, the duration t in RainyDay differs conceptually from design storm methods, in which hazard response is intrinsically sensitive to the user-specified duration, and this feature is indeed one of the chief advantages of SST over design storm methods for multi-scale flood hazard estimation (see Wright et al., 2020 for analysis and discussion). Choice of duration is discussed in more detail in here.
- Randomly generate an integer k, which represents a “number of storms per year.” In previous SST literature, the assumption was made that k follows a Poisson distribution with a rate parameter equal to the average number of storms per year. The m parent storms are selected such that an average of m/n storms per year are included in the storm catalog. For example, if m = 100 storms selected from a ten-year remote sensing dataset, then the rate parameter will be 100/10 = 10.0 storms per year. RainyDay will generate k using either the Poisson distribution or an empirical distribution (see here for more details). If the Poisson distribution is selected, RainyDay will automatically calculate the rate parameter based on user-specified m and the length of the input dataset.
- Randomly select k parent storms from the storm catalog. For each selected parent storm, transpose all rainfall fields associated with that storm by an east-west distance Δx and a north-south distance Δy, where Δx and Δy are drawn from the distributions DX(x) and DY(y) which are bounded by the east-west and north-south extents of AD, respectively. The motion and structure of the parent storm is unaltered during transposition and only the location is changed. The distributions DX(x) and DY(y) can be uniform, but RainyDay offers additional options, described here. For each of the k transposed storms, compute the resulting t-hour rainfall accumulation averaged over A. Step 4 can be understood as temporal resampling and spatial transposition of observed storm events within a probabilistic framework to synthesize one year of heavy rainfall events over AD and, thus by extension, over Aw. RainyDay (and previous SST efforts) typically retains the largest (in terms of rainfall intensity) of the k events for subsequent steps and discard the k-1 remaining events. The single retained storm can be understood as a “synthetic” annual rainfall maximum, analogous to those annual rainfall maxima that are extracted from rain gage records for rainfall frequency analysis. It should be noted that these rainfall events do not form a continuous series, meaning that neither inter-storm periods nor the temporal sequencing of the k storms are considered. Aside from generating annual maxima, RainyDay has several other options; see the descriptions of the CALCTYPE and NPERYEAR fields in the .sst file.
- Repeat steps 3 and 4 a user-specified Tmax, number of times, in order to create Tmax years of t-hour synthetic annual rainfall maxima for Aw. RainyDay then assigns each annual maxima a rank i according to its rainfall intensity relative to all others. Each of these ranked maxima can then be assigned an annual exceedance probability (AEPi), where AEPi is defined as i/Tmax. AEP is the probability in a given year that an event of equal or greater intensity will occur. The “return period” or “average recurrence interval” (ARI) Ti, commonly used in rainfall and flood frequency analysis, is simply Ti = 1/AEPi, so if Tmax = 103, it is possible to directly infer exceedance probabilities of 1.0 > AEP > 10-3 (ARI of 1.0 < Ti < 103). Each of these rainfall events can then serve as one datum of an empirical rainfall frequency estimate or as a rainfall scenario for rainfall-runoff modeling.
Structure of the ‘.sst’ File ¶
There are three main things needed to run a RainyDay analysis:
- A gridded precipitation dataset. See Appendix D for currently-available datasets.
- GIS shapefiles to define Aw and AD. These are technically optional but useful for most analyses. See Appendix C for instructions on one way to create these.
- The ‘.sst’ file. The elements of a ‘.sst’ file are described here.
Be warned, there is a lot of detailed information in this section. Most fields in the .sst file are technically optional, since RainyDay will use default values if they are omitted. To produce meaningful results, however, thorough understanding of many of the fields is necessary. The case studies for Madison, Wisconsin and the Big Thompson watershed, Colorado are useful for seeing which fields are most relevant to typical analyses.
NOTE: comments can be entered in the .sst file by including a preceeding “#” symbol. A comment could be an entire line, or could follow a non-comment on the same line.
Example ‘.sst’ File ¶
#============================================================
# WELCOME TO A RAINYDAY .SST FILE!
# THIS IS WHERE THE USER DEFINES INFORMATION FOR STOCHASTIC STORM TRANSPOSITION
#============================================================MAINPATH /RainyDay/BigThompson # this is a comment
# this is a comment
SCENARIONAME BigThompsonExample
RAINPATH /DATA/StageIV/NEWREPROJ/ST4.*.newregridded.nc
CATALOGNAME StageIV_72hour_BigThompson_Example.nc
CREATECATALOG false
DURATION 72
DURATIONCORRECTION false
NYEARS 500
NSTORMS 300
NREALIZATIONS 10
TIMESEPARATION 12
DOMAINTYPE irregular
DOMAINSHP /RainyDay/BigThompsonExample/BigThompsonSSTdomain-polygon.shp
LATITUDE_MIN 35.0
LATITUDE_MAX 44.0
LONGITUDE_MIN -106.5
LONGITUDE_MAX -104.5
DIAGNOSTICPLOTS true
FREQANALYSIS true
SCENARIOS true
SPINPERIOD false
EXCLUDESTORMS 45,130,132,168,235,300
EXCLUDEMONTHS 1,2,3,12
INCLUDEYEARS 2005-2020
TRANSPOSITION nonuniform
RESAMPLING poisson
UNCERTAINTY ensemble
RETURNLEVELS 2,5,10,25,50,100,200,500
RETURNTHRESHOLD 1
ROTATIONANGLE false
ENHANCEDSST false
NPERYEAR false
CALCTYPE ams
POINTAREA basin
WATERSHEDSHP /RainyDay/BigThompsonExample/BigThompson.shp
POINTLAT 40.35
POINTLON -105.65
BOX_YMIN 43.0
BOX_YMAX 43.25
BOX_XMIN -105.8
BOX_XMAX -105.45
MAXTRANSPO false
SENS_INTENSITY false
SENS_FREQUENCY false
Description of ‘.sst’ File Fields ¶
- MAINPATH (text string): Directory where the ‘.sst’ file is located and in which subdirectories will be created or modified.
- SCENARIONAME (required; text string): Name for the scenario. This will be the name of the subdirectory and the prefix for various output such including the frequency analysis file. If that subdirectory doesn’t already exist, it will be created.
- RAINPATH (required; text string): The location of the rainfall input NetCDF4 files. Only needed if you are creating a new storm catalog. Wildcards (‘*’) can and should be used in such a way to ‘find’ all the files in the input dataset. If, for example, you are using a dataset that includes a file named ‘/DATA/StageIV/NEWREPROJ/ST4.20020101.newregridded.nc’, then ‘/DATA/StageIV/NEWREPROJ/ST4.*.newregridded.nc’ is valid.
- CATALOGNAME (optional if not creating a new storm catalog, otherwise required; text string): The name of the storm catalog. If the user specifies that a new storm catalog should be created (using the CREATECATALOG field, which is explained next), a new file will be created (and any existing one with the same name will be overwritten). If the user does not create a new catalog, the CATALOGNAME should refer to an existing storm catalog. This catalog resides in: MAINPATH/{CATALOGNAME}.nc. The storm catalog will be a netCDF4 file.
- CREATECATALOG (required; ‘true’ or ‘false’): Do you want to create a new storm catalog (‘true’) or use an existing one (‘false’)? If ‘true’, RainyDay will create a new one called MAINPATH/{CATALOGNAME}.nc (and any existing one with the same name will be overwritten). If ‘false’, it will look for an existing storm catalog by the same name.
- DURATION (integer value): Duration of the rainfall accumulation period in hours. This corresponds to the variable t in the description of RainyDay above. If you are not creating a new storm catalog but rather using one that has already been generated, this duration is compared against the duration of the storm catalog. If the specified duration is shorter than that of the storm catalog, the rainfall duration is adjusted accordingly. If the specified duration is longer than that of the storm catalog, an error message is generated. If you are only intending to estimate rainfall IDF, this selection is straightforward. It is more complicated if you are planning on using RainyDay outputs for rainfall-runoff modeling to obtain flood frequency analyses. This issue is discussed further below. If you have an existing storm catalog that has a duration of, for example, 48 hours, you can specify a shorter duration (for example, 24 hours) and RainyDay will “trim” the catalog’s duration prior to frequency analysis. If the DURATIONCORRECTION option described next is used (and it is recommended to use it!) things behave somewhat differently, as explained below.
- DURATIONCORRECTION (optional; ‘true’ or ‘false’): Due to some issues intrinsic to how stochastic storm transposition works and how it is implemented in RainyDay, if you are attempting to produce short-duration IDF estimates (i.e. a relatively short value given for DURATION), your results will be more accurate if they are generated from a longer-duration storm catalog with a large number of storms. For example, 6-hour IDF estimates will be more accurate if they are produced using a 48-hour storm catalog. The reasons for this are complicated and not described here. The DURATIONCORRECTION option has been added to address this issue.This new addition to RainyDay is highly recommended if you are interested in generating IDF curves, particularly if they are of relatively short duration (e.g. 24 hours or less). This option will generate a longer-duration catalog (it will be either 72 hours or 3 times the length specified in DURATION-whichever of the two is longer). Then, during transposition and resampling, RainyDay will identify the heaviest rainfall periods within these longer rainfall periods. This, combined with a large number for NSTORMS (such as the default) can help reduce, though not fully eliminate, underestimation biases for more frequent recurrence intervals. It comes at a computational cost, however—the resampling and transposition step can take significantly longer. This is most recommended if you are conducting pixel-scale or small-area rainfall frequency analyses. This option has no effect on the NetCDF scenarios that RainyDay produces for subsequent hydrologic modeling, since conceptually that idea doesn’t make a lot of sense. However, it won’t cause these scenarios to be “wrong” in any way.
- NSTORMS (optional; positive integer): How many storms to include in the storm catalog (if creating one) or SST rainfall frequency analysis and scenario generation (if performing one). If you are creating a new storm catalog and omit this line, a default will be used of 20 storms per year. This corresponds to the variable m in Section 6. Note that the resampling rate variable is not specified directly by the user in RainyDay, but rather is calculated by dividing NSTORMS by the number of years of input precipitation data. If you are not creating a new storm catalog, NSTORMS will be inherited from the existing storm catalog, or can be specified as a number less than or equal to the number of storms in the existing catalog. This number will be reduced accordingly in the following situations: 1.) you are not creating a new catalog and you wish to exclude certain storms (EXCLUDESTORMS); exclude certain months that were not previously excluded in the creation of the storm catalog (EXCLUDEMONTHS), or only include certain years via the INCLUDEYEARS option. More frequent return periods (for example, <50 years), particularly for short durations, are highly sensitive to this choice. It you care about common return periods, then NSTORMS should be relatively large. There is no simple guidance on what this means, but I recommend something like at least 10-20 storms per year of available rainfall data (the latter being the default number if you omit this line). Also, see below for further discussion of this option/issue.
- NYEARS (optional; positive integer): How many years of annual maxima rainfall to be synthesized. This corresponds to the variable Tmax described elsewhere in this document. Note that this is not the number of years of input rainfall data, which RainyDay will calculate automatically. If omitted from the .sst file, this will default to 100, which means that the maximum return level that can be estimated is 100 years.
- NREALIZATIONS (optional; positive integer): How many NYEARS-long sequences to be generated. Multiple sequences let you examine some aspects of uncertainty. Technically this is optional-the default value is 1. This is not recommended, however, since it will not return uncertainty bounds of any kind. See additional discussion below.
- UNCERTAINTY (optional; ‘ensemble’ or a positive integer less than or equal to 100): Defines the type of uncertainty you want to calculate for inclusion in the .FreqAnalysis file, IDF figure, and rainfall scenarios. For example, if NREALIZATIONS is 1000, that means that RainyDay’s SST will generate an ensemble of 1000 estimates for each return period, which can be assessed for variability/uncertainty. If you want the full ensemble spread for any exceedance probability, put “ensemble” (the default). If you want only the range between the X/2 and 100-X/2 quantiles, put the integer value X (for example, if NREALIZATIONS is 1000 and UNCERTAINTY is 95, then the range between the 50th and 950th realizations will be given). “ensemble” is recommended. See additional discussion below.
- TIMESEPARATION (optional; positive integer): The minimum separation time in hours between two storms in the storm catalog. During storm catalog creation, when a new rainfall event is being evaluated for inclusion, its timestamp is compared against those already in the catalog. If the difference in timestamps between the new storm and an existing one is less than TIMESEPARATION, the larger event (in terms of rainfall accumulation) is retained and the smaller event is discarded. If this value is omitted from the .sst file, it will default to zero, meaning storms can be consecutive (but will not temporally overlap).
- DOMAINTYPE (required; text string): There are two options: ‘rectangular’ (the default) or ‘irregular’. ‘rectangular’ is the simplest, but may only be appropriate in regions with few topographic or coastal features. It is not recommended to change these values between runs unless a new storm catalog is created. The following fields correspond to the choice of domain type (RainyDay will be unpredictable and probably lead to some sort of error message if these fields are changed from those used to create the storm catalog):
DOMAINSHP (text string): file path and filename for RainyDay-compliant shapefile defining the boundary of the transposition domain AD. This shapefile must be in the EPSG:4326 geographic projection (i.e. regular latitude/longitude grid) and ideally only contains a single polygon feature (what would happen with additional features is untested). Instructions for “drawing” the shapefile needed to use the DOMAINSHP option, using Google Earth, is shown in Appendix C.
DOMAINFILE (text string): More advanced feature not described here.
LATITUDE_MIN (decimal latitude): Southern boundary of transposition domain, in degrees latitude. Only needed if DOMAINTYPE is set to ‘rectangular’.
LATITUDE_MAX (decimal latitude): Northern boundary of transposition domain, in degrees latitude. Only needed if DOMAINTYPE is set to ‘rectangular’.
LONGITUDE_MIN (decimal longitude between -180.0 and 180.0): Western boundary of transposition domain, in degrees longitude. Only needed if DOMAINTYPE ise set to ‘rectangular’.
LONGITUDE_MAX (decimal longitude between -180.0 and 180.0): Eastern boundary of transposition domain, in degrees longitude. Only needed if DOMAINTYPE ise set to ‘rectangular’. - DIAGNOSTICPLOTS (optional; ‘true’ or ‘false’): Set to ‘true’ (the default) to produce diagnostic plots. This can take a few minutes, and can typically be set to false unless a new catalog is being created. There are five types of plots produced.The two figures shown below are examples of a diagnostic plots produced by RainyDay. These plots are described in more detail in the Madison, Wisconsin case study.
Figure: Examples diagnostic plots from RainyDay. Top-left: probability of storm occurrence in a storm catalog. Top-right: mean storm total in the storm catalog. Bottom-left: standard deviation of storm total rainfall in the storm catalog. Bottom-right: CDF of storm total rainfall in storm catalog, with the largest storm highlighted. See the Madison, Wisconsin case study for more details.
Figure: Example diagnostic plots from RainyDay. Top: storm total rainfall maps for two major storms in a storm catalog. Bottom: rainfall hyetographs for the same two storms. See the Madison, Wisconsin case study for more details.
- FREQANALYSIS (optional; ‘true’ or ‘false’): Create a “.FreqAnalysis” file based on the rainfall annual maxima generated. The default value is true.
- SCENARIOS (optional; ‘true’ or ‘false’): Create watershed-specific spacetime rainfall scenarios corresponding to each annual rainfall maxima. These scenarios can be used to drive a hydrologic model or landslide model. If you aren’t planning on doing any modeling, there is no need to generate them. The code is currently created such that these scenarios are grouped by realization, so that there are NREALIZATIONS netCDF files created, each of which contains at most NYEARS scenarios. If RETURNTHRESHOLD is greater than 1 (highly recommended) the number of scenarios will be reduced, perhaps dramatically, depending on the specified return period threshold.The default value is false.
- SPINPERIOD (optional; ‘false’ or positive integer): This only applies if SCENARIOS is set to “true.” It provides the ability to “pre-pend” a rainfall period to the beginning of the output scenarios. Options are either “false” or an integer number of days. So if SPINPERIOD is set to 5, then five days of rainfall will be prepended to each scenario. Currently, the code is written such that the prepended rainfall time period for each scenario is drawn randomly from the rainfall that precedes the storms in the storm catalog from a period of one month earlier to one month later. For example, if SPINPERIOD is set to 5, and if a scenario is being written that is a transposition of a storm that occurred in July, then the preceding rainfall will be randomly selected from the 5-day rainfall periods preceding all storms in the storm catalog that occurred in June, July, and August. This “spin-up period” enabled by SPINPERIOD is potentially very useful, because it allows whatever hydrologic model is being used to achieve a realistic initial moisture condition prior to the onset of the main storm. In the case of floods, it also allows the establishment of realistic baseflow conditions. Caution should be used, however, since for large areas, SPINPERIOD could create very large arrays and there might be memory-related crashes. See additional discussion below.
- RETURNTHRESHOLD (optional; positive integer): Minimum return period for which a spacetime rainfall scenario will be written. This can dramatically reduce the number of (uninteresting) low intensity spacetime rainfall scenarios included in subsequent modeling. The default is to not use a threshold; i.e. 1-year return period up to NYEARS will be calculated.
- EXCLUDESTORMS (optional highly recommended if using radar rainfall data; comma-delimited list of positive integers): Use this to exclude certain storms that are part of the storm catalog but which you do not wish to include in subsequent frequency analysis or spacetime rainfall scenarios. There are two main reasons for doing this: 1.) it allows you to examine the sensitivity of the method to the inclusion of particular storms, and 2.) it allows you to exclude storms that have unrealistic properties (easily diagnosed via the diagnostic plots). This is an important consideration when using a ground-based radar rainfall input dataset, in which radar artifacts can seriously affect the results. Multiple storms must be comma-separated. You can consult the diagnostic plots to identify which storms to exclude and their storm numbers. If creating a new storm catalog, you will get an error message if you try to exclude storms, since that seems like a pretty bad idea and is probably an accidental mistake.
- EXCLUDEMONTHS (optional; comma-delimited list of positive integers 1-12): Months (1-2 digit numeric) to be excluded from the storm catalog creation and/or subsequent frequency analysis and spacetime rainfall scenario generation. Months must be comma-separated. If you omit or comment out this line, all months will be used.
- INCLUDEYEARS (optional; ‘all’ or comma-delimited list of years YYYY or range of years YYYY-YYYY): Years (4-digit numeric) to be included in the storm catalog creation and/or subsequent frequency analysis and spacetime rainfall scenario generation. Years can be either specified as a range, separated by “-” or as a comma-separated list. Examples: (2002-2005 is equivalent to 2002,2003,2004,2005). For the vast majority of analyses, you should use ‘all’, or simply omit/comment out this line.
- RESAMPLING (optional; ‘poisson’, ’empirical’, or ‘negbinom’ THE LAST OPTION IS UNTESTED): Defines the method used for generating the number of storms to be transposed for each synthetic year. This corresponds to the variable k in descriptions of SST in RainyDay. If ‘poisson’ is selected, then the generator will be a Poisson distribution with rate parameter that will be estimated from NSTORMS (potentially less, depending on usage of EXCLUDESTORMS, EXCLUDEMONTHS, and INCLUDEYEARS) divided by the number of years in the dataset. If ’empirical’ is selected, the generator is an empirical distribution determined by the counts per year in the storm catalog. if ‘negbinom’ is selected, the number of counts per year will come from a negative binomial distribution estimated via method of moments. This last option is untested and not recommended for short datasets due to the uncertainty in estimating negative binomial parameters from short records. Note that a minor statistical artifact may arise in regions with extremely rare rainfall, because there is a constraint that every synthetic year must have at least one storm. But for most situations this shouldn’t be a problem. The option for the empirical distribution is useful in cases where the occurrence of extreme rainfall is over or under-dispersed. To evaluate this formally, you would compute the index of dispersion (the variance of the counts-per-year divided by the mean of the counts-per-year) and determine if it is significantly different from 1.0.
- TRANSPOSITION (optional; ‘uniform’ or ‘nonuniform’): Defines how the random spatial transposition should be done. With “uniform,” the resampled storms are randomly located according to a 2-D uniform distribution. With “nonuniform” the resampled storms are randomly located according to a 2-dimensional Gaussian kernel density estimate based on the locations of the original storms in the storm catalog. An example of the 2-D Gaussian kernel can be seen above under the DIAGNOSTICPLOTS field. In reference to the variables ΔX(x) and ΔY(y): the TRANSPOSITION field determines these automatically; you don’t actually specify those variables .
- ROTATIONANGLE (optional; ‘none’ or ‘MinAngle,MaxAngle,Nangles’, see below for details): RainyDay allows for stochastic rotation of storms. It should be noted, however, that this was coded many years ago. Since then, lots of changes and additions to other parts of RainyDay have been made, and rotation has not thoroughly re-tested to make sure it is compatible with other changes and additions. So use this option at your own risk! Omitting the line or specifying ‘none’ will prevent rotation. MinAngle (MaxAngle) is the degrees clockwise (counterclockwise) from the x-axis and can be integer or decimal numbers. MinAngle is the must be less than or equal to zero and MaxAngle must be greater than zero. Nangles must be a positive integer and specifies the number of discrete “angle bins” between MinAngle and MaxAngle that can be sampled from for each storm. For example, if MinAngle=-20 degrees, MaxAngle=20 degrees, and Nangles=5, then all storms when transposed would be rotated one of the five (randomly selected) amounts: -20, -10, 0, +10, or +20 degrees. These bins are used, rather than making the angle of each storm fully stochastic, because the latter approach would require regridding storms many (potentially millions) of times, which would be very slow. Even with this simplifcation, the speed of the rotation scheme will depend greatly on the choice of Nangles, and relatively small numbers are recommended (e.g. 5). Keeping values of MinAngle and MaxAngle relatively small (for example, MinAngle=-20 and MaxAngle=20 degrees or even less!) will keep rotation more physically reasonable. There are a few things to note. First, as stated before, this can be slow for large storm catalogs, high-resolution rainfall data, and high values of NYEARS or NREALIZATIONS). Second, it should be used with caution, since in the real world, storm orientation is in many cases tightly tied to terrain such as topographic features, or tied to atmospheric phenomena such as high-pressure ridges. This means that rotation can produce storms that would be unrealistic in the real world.
- RETURNLEVELS (optional; comma-delimited list of positive integers): Specific certain return periods for which you want to produce output scenarios and frequency analyses. For example, 2,5,10,20,50,100,200,500,1000.
- ENHANCEDSST (optional; ‘false’, ‘stochastic’, ‘deterministic’, or ‘dimensionless’): Advanced option for performing ‘enhanced SST’ to account for precipitation gradients within the transposition domain. While useful, this has not been adequately tested either theoretically or within the code, and requires very specific additional preprocessing and input data. It is not covered further here.
- STOCHASTICRESCALING (optional; ‘true’ or ‘false’): Omit or comment this out. This is an advanced option that is not described here.
- RAINDISTRIBUTIONFILE (optional; text string): Omit or comment this out. This is an advanced option that is not described here.
- POINTAREA (‘point’, ‘grid’, ‘rectangle’, or ‘watershed’***): This defines the area Aw that will be used in the calculation of rainfall accumulations during storm catalog creation and SST analysis. Output scenarios will only cover this area. Options include ‘point’ or ‘grid’, which work exactly the same way (in reality this option chooses a single rainfall grid cell, rather than a point estimate), ‘rectangle’,” or ‘watershed’. The ‘point’ option is not recommended when creating a storm catalog but is perfectly fine to use subsequently for frequency analysis or for output scenario generation. ‘box’ useful for comparing results from rainfall datasets that have different input resolutions, as the procedure will aggregate the rainfall estimates to the size of the box. Note that if you wish to perform this aggregation, that it is wisest of the box is at least as large as the input resolution of the coarsest dataset. The following fields are relevant depending on the choice of POINTAREA:
POINTLAT (decimal latitude between -90.0 and 90.0): The latitude of the analysis point, if POINTAREA is set to ‘point’ or ‘grid’. The program will find which pixel contains this point.
POINTLON (decimal longitude between -180.0 and 180.0): The longitude of the analysis point, if POINTAREA is set to ‘point’ or ‘grid’. The program will find which pixel contains this point.
BOX_YMIN (decimal latitude between -90.0 and 90.0): The southernmost boundary of the analysis box, if POINTAREA is set to “box.” The program will “snap” this box to the closest rectangular grouping of pixels.
BOX_YMAX (decimal latitude between -90.0 and 90.0): The northernmost boundary of the analysis box, if POINTAREA is set to “box.” The program will “snap” this box to the closest rectangular grouping of pixels
BOX_XMIN (decimal longitude between -180.0 and 180.0): The westernmost boundary of the analysis box in degrees longitude, if POINTAREA is set to ‘rectangle’. The program will “snap” this box to the closest rectangular grouping of pixels.
BOX_XMAX (decimal longitude between -180.0 and 180.0): The easternmost boundary of the analysis box in degrees longitude, if POINTAREA is set to ‘rectangle’. The program will “snap” this box to the closest rectangular grouping of pixels.
WATERSHEDSHP (text string): file path and filename for RainyDay-compliant shapefile defining the boundary of the watershed area Aw. This shapefile must be in the EPSG:4326 geographic projection (i.e. regular latitude/longitude grid) and ideally only contains a single polygon feature (what would happen with additional features is untested).
***NOTE: there is technically another option for POINTAREA called ‘pointlist’, in which the user can input a list of lat/lon coordinates and RainyDay will run frequency analysis for all those points. However, this option can be quite slow and memory-intensive, and, for the results to be meaningful, needs to be used in conjunction with other very advanced options that aren’t explained here. - SENS_INTENSITY (optional; ‘false’ or decimal percentage): This allows easy sensitivity analysis to rainfall intensity. Options are either “false” (no alteration of intensity) or a numeric percentage change. For example, an input of 5.0 means that the intensity of rainfall will be increased by 5%. NOTE: this hasn’t been tested recently.
- SENS_FREQUENCY (optional; ‘false’ or decimal percentage): This allows easy sensitivity analysis to rainfall frequency. Options are either “false” (no alteration of storm rate of occurrence) or a numeric percentage change. For example, an input of -15.0 means that the rate of storm occurrences will be reduced by 15%. NOTE: this hasn’t been tested recently.
- INTENSDISTR (optional): Not described here. NOTE: this hasn’t been tested recently.
- CALCTYPE (optional; ‘ams’ or equivalently ‘annmax’, ‘pds’ or equivalently ‘partialduration’): Formerly, RainyDay only allowed annual maxima-type analyses, which is the default. Now you can do partial duration series (PDS) too! There are some restrictions on PDS though: the analyses will only consider n-choose-n rainfalls for the analysis, where n is the number of years in the analysis. Furthermore, you can only use PDS for rainfall frequency analysis, not to generate rainfall scenarios for, e.g., subsequent flood frequency analysis using rainfall runoff models. There are conceptual reasons for this. The proper route, conceptually, to go down if you want to do a PDS analysis of flood frequency using RainyDay-based rainfall scenarios is to use the NPERYEAR option below.
- NPERYEAR (optional; ‘false’ or a positive integer): Formerly, RainyDay only allowed the user to create rainfall scenarios that consisted of annual maxima (i.e. one transposed storm per synthetic year, constituting the largest rainfall accumulation over Aw). If the NPERYEAR option is used, multiple rainstorms per synthetic year are written to the output scenarios.
- MAXTRANSPO (optional; ‘true’ or ‘false’): If ‘true’, rather than performing SST, RainyDay will find the specific transposition that maximizes precipitation over Aw and write a scenario file. Note: if you only care about the maximum precipitation value (and not an output spacetime scenario) you can find that total in the diagnostic plot for the largest storm in your storm catalog.
RainyDay Features ¶
Finding the “Maximum Storm” ¶
Vast parts of the RainyDay code are devoted to generating probabilistic estimates of rainfall frequency and associated rainfall scenarios for flood frequency modeling. Sometimes, however, something much simpler might be wanted: what is the storm that has occurred within the precipitation input dataset over AD that would have produced the largest rainfall amount over Aw, had it been a “direct hit” on Aw? If you just want the rainfall depth over the duration specified via the DURATION field, this is easy: create diagnostic plots, and then look at the CDF diagnostic plot (see the figure below) and/or the storm total rainfall map and hyetograph for the largest storm. If, however, you want the rainfall fields associated with this storm, you’ll need to use the MAXTRANSPO field. This will generate an additional output file with the suffix ‘_maximizedstorm.nc’.
Figure: Example CDF of storm total rainfall from a storm catalog. The largest storm total rainfall is shown.
SST Internal Variability ¶
In RainyDay, the user specifies NREALIZATIONS, the number of $T_{max}$-year long “ensemble members” to be generated, as well as UNCERTAINTY, which specifies the interquantile range to be calculated. This enables examination of “internal variability,” i.e. how much variation in rainfall intensity is possible for a given AEP for a given input rainfall dataset and set of user-defined parameters. For example, if the user specifies ‘NYEARS 1000’ and ‘NREALIZATIONS 100’, then there will be 100 intensity estimates for each AEP between 1.0 and 10-3 (ARI between 1 and 1,000 years). RainyDay will automatically generate text file and graphics files containing the results of this rainfall frequency analysis, including the rainfall mean, minimum, and maximum (or, optionally, a quantile interval) for each AEP, computed from the N ensemble members.
If the scenarios generated by RainyDay are fed through a rainfall-runoff model, then the ensemble spread will propagate through to generate ensemble flood estimates. A useful and interesting feature of SST and RainyDay that is discussed at length in Wright et al. (2014), is that the exceedance probability of rainfall and of subsequent hazards can be decoupled using SST, particularly if some realistic scheme is used to account for the initial conditions in Aw (such as soil moisture or baseflow). Consider the example where N=1 and N=103 rainfall scenarios (Tmax=103) are created as input to a rainfall-runoff model. One of these rainfall scenarios has AEP=0.01 (in terms of watershed-average t-hour rainfall depth over an area Aw). Even if initial conditions are kept constant across all Tmax simulations, the AEP of the peak discharge or volume predicted by the model for this particular scenario need not be equal to 0.01, since the space-time structure of the rainfall scenario and its interactions with watershed and river network features (at least if a distributed rainfall-runoff model is used) can dampen or magnify the flood severity. If variability in initial conditions within the rainfall-runoff model are considered, this dampening or magnification can be even greater. This property of SST contrasts with design storm methods, which typically assume a 1:1 relationship between the AEP of rainfall and the resulting hazard, though variability in initial conditions could in principle be used with design storm approaches to produce some degree of “decoupling” of rainfall and flood AEP. RainyDay provides one simple scheme (invoked using the SPINPERIOD field; see here) for creating variability in initial conditions, though better methods have been used (e.g. Yu et al. 2019, 2020, 2021) that require additional work.
Like virtually all frequency analysis methods, the ensemble spread generated by RainyDay does not consider measurement error, which can be substantial. Since the ensemble spread is characteristic to a given set of user-defined values such as AD or m, it does not consider uncertainty associated with these choices. Analyses in Wright et al. (2017) and also in the Madison, WI case study in this guide showehow such uncertainties (though not measurement error) can be assessed, but fundamentally this requires manipulating the size or composition of the storm catalog through the varying the choices of user-defined values, necessitating multiple distinct runs of RainyDay.
If the user is only interested in examining internal variability of SST-based rainfall frequency, then the number of ensemble members can be large (e.g. N = 100). If the user wishes to perform rainfall-runoff simulations, however, N should be selected with consideration of the computational and storage costs associated with large numbers of simulations, which can be substantial for distributed rainfall-runoff models. To help manage the number of simulations required, the user can specify a rainfall return period threshold, below which output scenarios will not be created. For example, if the user specifies a 5-year threshold, no rainfall scenarios with a rainfall depth less than the 5-year return period depth will be written, which reduces the number of simulations by 80% for a given value of N while still retaining the most extreme scenarios.
Rainfall Heterogeneity and Non-Uniform Spatial Transposition ¶
A common criticism of SST is that its validity is restricted to regions with homogenous extreme rainfall properties. As previously mentioned, depending on how rigidly this criterion is enforced, the method would be limited to small, flat regions far from topographic features, water bodies, etc. It is unclear how homogeneity would be determined, particularly given the paucity of rainfall data in most regions. Instead, steps can be taken to use SST in more varied geophysical settings. Regardless of the setting, the selection of AD requires an understanding of regional rainfall patterns and of the intrinsic assumptions of SST. Though more work is needed to understand the geographic limits of the applicability of RainyDay in complex terrain, the works of England et al. (2014), Zhou et al. (2019), and and Yu et al. (2021) provide examples of SST usage in complex terrain.
RainyDay provides several tools to help understand the issue of rainfall heterogeneity, and, to some extent, to mitigate it. First, RainyDay produces a map showing the location of the rainfall centroids for all storms in the storm catalog, overlaid on a smoothed field of the spatial probability of storm occurrence within AD. This spatial probability of occurrence map is generated by applying a two-dimensional Gaussian kernel smoother to the (x,y) locations of the rainfall centroids for all the storms in the storm catalog. This smoothed field is then normalized such that the sum of all grid cells in A equals 1.0, thus creating a two-dimensional probability density function (PDF) of storm occurrence. A second plot shows these rainfall centroids overlaid with the average rainfall per storm across AD. These diagnostic plots assist in understanding regional variations in storm occurrences and rainfall over AD. Examples of these diagnostic plots can be found in both case studies in this User’s Guide. The top panel suggests that storms are somewhat more frequent in the southernmost third or so of the transposition domain (top panel), along with slightly elevated activity in the northeast quadrant. The bottom panel shows somewhat higher average storm rainfall in these two areas. Caution should be taken when drawing firm conclusions from these diagnostic plots, however, since rainfall heterogeneities evident in both storm occurrences and average storm rainfall may be the result of spatial biases in rainfall remote sensing estimates or of randomness in the climate system over the relatively short remote sensing record, rather than from “true” heterogeneity in the underlying rainfall hydroclimate.
Additional optional diagnostic outputs include rainfall maps and hyetographs for each storm in the storm catalog. These storm rainfall maps are useful for diagnosing “bad data,” particularly in rainfall datasets that use ground-based weather radar contaminated by radar beam blockage and other artifacts. Anomalous storm periods must be identified by the user (i.e. no automatic data quality checking is provided), but such periods can be excluded from subsequent analyses. Examples of this process are provided in the detailed Madison, WI Case Study in this tutorial.
The two-dimensional PDF of spatial storm probability of storm occurrence can optionally be used as the basis for non-uniform spatial transposition so that the spatial distribution of storm occurrences will be preserved between the input data and output rainfall scenarios and rainfall frequency estimates.
It is important to note that this approach only addresses the spatial heterogeneity of storm occurrences, not of spatial variations in the climatology of rainfall intensity (due to topography or other factors). For example, if AD contains a flat plain and an adjacent mountain range, the probability of storm occurrence will vary across AD. This variation will be captured in the two-dimensional PDF of spatial storm probability and, using the optional non-uniform spatial transposition scheme, will be reflected in RainyDay outputs. In this example, however, rainfall intensity from these storms will also vary according to the underlying topography. The current transposition scheme in RainyDay cannot explicitly account for this variation, which is likely to be a serious constraint in some regions.
Temporal Resampling Options ¶
As mentioned in Step 3 of the SST procedure, previous SST work has employed the assumption that the annual number of storm counts follows a binomial or Poisson distribution with arrival rate parameter equal to m/n (where, again, m is the number of storms in the storm catalog which spans a range of n years). This rate parameter in turn serves as the basis for the temporal resampling of storms (i.e. for generating the number of storms per year k that will be spatially transposed). RainyDay supports Poisson-based resampling (though not binomial), but also allows the use of an empirical distribution to determine k. The empirical distribution is derived from the number of storms that enter into the storm catalog from each calendar year in the rainfall input dataset. Then, during the temporal resampling step, k is obtained by randomly selecting one of these values. This feature may be useful in regions where storm occurrences exhibit strong clustering (i.e. where there is strong evidence for more storms in some years and fewer in other years for persistent climatological reasons; e.g., Villarini et al., 2013).
Currently, RainyDay requires k to be at least 1; i.e. there will be at least one storm selected and transposed. So, if the Poisson or empirical distribution yields 0 storm arrivals for a year, RainyDay will set it to 1. This is fine in situations where m is somewhat larger than n, but could be problematic if the two numbers are similar or if m < n. For example, using the Poisson distribution with m/n = 5.0, there is only a 0.007 probability of k = 0. But if m/n = 1.5, this probability rises to 0.22. Cases of small m/n could unrealistically “color” your frequency analysis results since RainyDay would be producing too many storms on average. Correcting this issue is on the author’s “RainyDay to-do list.”
Special considerations for SST-based Flood Frequency Analysis ¶
The ‘NPERYEAR’ Option ¶
Formerly, RainyDay only allowed the user to create rainfall scenarios that consisted of annual maxima (i.e. one transposed storm per synthetic year, constituting the largest rainfall accumulation over Aw). When using RainyDay scenarios as inputs to flood frequency analysis, this leads to two limitations:
- Limitation 1: It is generally understood that the largest rainfall accumulation per year may not necessarily produce the largest flood. Rather, the flood magnitude depends on various factors including antecedent soil moisture, snowpack (if any), the spatiotemporal structure of the rainfall (including storm movement), and where rain falls vis-a-vis land use/land cover, soils, and the river network. To account for this, more than a single transposed storm per synthetic year–but not a large number–can be outputted to the rainfall scenarios from RainyDay. No hard and fast rules explain what NPERYEAR might be needed for this, but Yu et al. (2019) showed that 5 storms per year was sufficient to eliminate this limitation in a 4,400 km2 watershed.
- Limitation 2: If rainfall-runoff modeling using only annual maxima RainyDay rainfall scenarios selected with respect to rainfall accumulation over Aw), flood quantile estimates at upstream points (i.e. scales much smaller than over Aw) will not adequately reflect the rainfall recurrence properties at those scales. In practice, this means that flood quantiles at those smaller scales will tend to be underestimated. While no paper has a “complete” explanation of this issue, Wright et al. (2014) and Wright et al. (2020) provide most of the logic. There are no hard and fast rules about what NPERYEAR might be needed to cope with this, but unpublished work led by R. Mantilla found that 10 storms was sufficient to completely eliminate this limitation in a 4,400 km2 watershed. NOTE: this limitation only pertains if semi- or fully-distributed rainfall-runoff models are being used and the user hopes to obtain flood frequency results at multiple points within the watershed.
“Spin-up” of Initial Conditions ¶
A key issue in the modeling of rainfall driven hazards is to adequately represent initial conditions. In many flood and landslide modeling efforts, the most critical of these is antecedent soil moisture, while other states such as snowpack, river baseflow, and water table position may also be relevant. Many hydrologic models allow for the specification of such initial conditions, and thus many design storm-based hazard modeling efforts rely on an assumed soil moisture state, such as an average or fully saturated condition. Such approaches have previously been used with SST (Wright et al., 2014), and could be combined with the rainfall scenarios generated via RainyDay. This approach has the downside, however, that the true variability antecedent soil moisture is not captured in hazard predictions. This is particularly important in regions in which heavy rainfall does not necessarily occur in the same season as high soil moisture conditions. A second approach that can capture this variability would be to derive a distribution of antecedent soil moisture from previous long-term (ideally continuous multi-decadal) model simulations. This approach is explored in Yu et al. (2019, 2020, 2021). Since there can be substantial variation in how soil moisture is represented in different models, the same model should be used for these long-term simulations and for the hazard scenario modeling.
Continuous simulation methods such as those in Yu et al. (2019, 2020, 2021) are probably the best options available for selecting initial conditions for RianyDay-based flood frequency analyses. RainyDay offers an alternative option, however, in which initial soil moisture can be “spun up” within the model to represent (hopefully) seasonally realistic initial conditions without the need for long-term simulations. The spin-up procedure is described for a single rainfall scenario.
The month of occurrence of the rainfall scenario is identified based on the “parent storm” that created it. Then RainyDay identifies the set of X-day periods (where X is a user-defined spin-up period) preceding all parent storms that occur within a user-defined number of months from the date of occurrence of the parent storm. One of the X-day periods is randomly selected and pre-pended to the rainfall scenario. This scheme helps to ensure that spin-up conditions are reasonable for the given season. It also helps ensure that spin-up conditions have realistic temporal correlations when pre-pended to the rainfall scenario (for example, if there is a historical tendency for several days of moderate rain prior to heavy storms but several days of heavy rain prior to the main storm doesn’t have historical precedent, these conditions will be properly represented). It is important to note, however, that the 10 to 20-year records typical of rainfall remote sensing records may not capture the full variability of “true” initial conditions.
This pre-pending procedure creates rainfall scenario output files that are of duration X+t. The modeler can then assign an average initial soil moisture condition to initialize each model run, and use the rainfall scenario as input. Soil moisture within the model will then evolve over the spin-up period based on the rainfall (or lack thereof), evapotranspiration, and other modeled processes. This spin-up procedure has storage and computational costs since it can substantially increase the size of the rainfall scenario files generated by RainyDay and increase the length of each hazard simulation. The importance of these limitations depends on the size of Aw, the resolution of the input rainfall dataset, and the computational burden of the hazard model. The modeler needs to evaluate the tradeoffs between longer X and the associated storage and computational costs.
Selecting Rainfall Duration for Flood Frequency Analysis¶
If you are only intending to estimate rainfall IDF, choosing DURATION is straightforward (and, as explained above, the DURATIONCORRECTION option is highly recommended). If you are planning on using RainyDay outputs for rainfall-runoff modeling to obtain flood frequency analyses, choosing DURATION is more complicated. Short durations (together with small NSTORMS) can cause a number of issues that aren’t all explained here but that can lead to serious biases in flood quantile estimates, particularly for smaller return periods (e.g. 2-year, 10-year, 50-year).It is not recommended to use conventional notions of response time or time of concentration as the sole criteria for selecting an appropriate duration. Instead, the chosen duration should be longer than the time of concentration (oftentimes much longer!). Basically, DURATION must be “long enough” for a flood event to occur, and should hopefully be long enough that all or at least the important bits of major rainstorms are “contained” within the rainfall scenarios outputted from RainyDay. There is an additional reason for preferring long durations which is more difficult to explain but again argues for relatively long durations. I recommend choosing durations of at least 48 hours and preferably longer, even for very small watersheds.
Not surprisingly, the chosen DURATION should be longer for larger watersheds. However, it should not be extremely long (e.g. 30 days), for several conceptual and computational reasons. While one could in principle create a storm catalog comprised of the most rainy multi-week periods and conduct SST using these, the number of available “events” would be small. If month-long storm periods were used to create a storm catalog, for example, the number of “events” would be at most 12 times the length of the rainfall record in years. Few of these storm catalog entries would be particularly important from a flood-generation standpoint. Second, transposition of a month-long rainy period may not yield sufficient realizations of rainfall spatiotemporal variability, since the individual storm elements within this period would be fixed in their relative locations and timing. The “reshuffling” of individual storm elements in space and time to create new realizations may be possible but poses conceptual and practical challenges and is well beyond the current capabilities of RainyDay or other SST frameworks.
Parametric Rainfall Intensity ¶
Instead of relying on the rainfall intensity derived from a remote sensing input dataset, a user might prefer to use a parametric distribution to impose rainfall depths on the rainfall output scenarios. RainyDay supports this option. The user can supply a t-hour rainfall depth distribution. This distribution is then applied to the output rainfall scenarios via a normalization procedure that assumes that the supplied distribution corresponds to the annual maximum t-hour rainfall intensity for a single rainfall grid cell. Rainfall space-time structure is still derived from the remote sensing data. It should be noted, however, that when the resolution of the input remote sensing dataset is coarse relative to the spatial coverage of the rainfall measurement device upon which the parametric distribution is based (for example, the 16-500 km2 footprint of many satellite rainfall datasets relative to the 0.1 m2 sampling area of a single rain gage), this approach may be problematic. This procedure is also problematic in regions where such parametric rainfall distributions might be the synthesis of “mixture distributions” of distinct storm types in which rainfall intensity is intrinsically linked to rainfall space-time structure (e.g. Smith et al., 2011), since RainyDay does not distinguish between different storm types. Currently only the three-parameter generalized extreme value distribution is supported, though it would be straightforward to add additional options. NOTE: the code for this option was written a long time ago and has not been tested recently. It might be broken.
Case Study 1: Single Gridcell Rainfall Frequency Analysis in Madison, Wisconsin ¶
In this case study, we will generate 6-hour and 24-hour rainfall frequency curves at the scale of an individual precipitation grid cell for Madison, WI. We will illustrate a relatively simple case of transposition domain selection, how to create and “fix” storm catalogs, and evaluate sensitivity to a number of RainyDay options. Please note that frequency estimates from a single grid cell are the closest that one can get using RainyDay to so-called “point-scale” estimates, i.e. based on rain gage measurements. However, the sampling area of a single precipitation grid cell can range from 1 km2 up to 1,000 km2 or more, depending on the dataset, while a rain gage has an orifice that samples an area of roughly 10-7 km2 (i.e. the size of a dinnerplate). Rainfall properties can vary dramatically between these two scales, and thus it is not reasonable to expect RainyDay to produce precise point-scale estimates. Nonetheless, in this case study we will compare RainyDay results to point-scale results from NOAA’s Atlas 14/Precipitation Frequency Data Server.
Southern Wisconsin, like the rest of the midwestern United States, has been a hotspot for increases in extreme rainfall related to climate change. This has manifested as a series of devastating storms throughout the state, with notable examples in 2008, 2010, 2016, 2018, and 2019. It is thus worth noting that NOAA Atlas 14 frequency estimates for Wisconsin use data through only about 2010, meaning that many of these more recent storms are “missing” from those Atlas 14 estimates.
The user may be interested to know that RainyDay was used to create rainfall intensity-duration-frequency estimates for the entire state of Wisconsin as part of the Wisconsin Rainfall Project. That project used a number of advanced features, most of which are embedded within the RainyDay code but are not discussed in this tutorial document due to their complexity and lack of user-friendly preprocessing tools and.
An ‘.sst’ file, a storm catalog, and a small subset of the results presented below are available here. A brief description and download link for the Stage IV precipitation dataset used in this study can be found here.
Part 1: Setting Up RainyDay ¶
Location Description ¶
Madison is located in southcentral Wisconsin and is home to the University of Wisconsin-Madison, the best public university in the United States. The terrain in most of the surrounding region is relatively flat, with some greater topographic relief west of the city that is not generally believed to play a major role in influencing precipitation. Major storms tend to arrive from the west or southwest, though the major storms in and around Madison in late August 2018 moved from south to north. The most prominent geographic feature is Lake Michigan roughly 140 km east of Madison. The Google Earth screenshot below shows the region and the transposition domain that will be used:
Figure: Google Earth screenshot of the domain surrounding Madison, Wisconsin. The shaded blue map shows the 2-year 24-hour rainfall from Volume 8 of NOAA’s Atlas 14. The red rectangle is the primary transposition domain that will be used in this case study. The Atlas 14 rainfall ends at the Wisconsin-Illinois border since Illinois is part of a separate Atlas 14 volume.
Rainfall Data Selection ¶
This case study will make use of the Stage IV precipitation dataset. While there could be compelling reasons to use the other datasets described in Appendix B, there are several factors that make it useful both for demonstration purposes and for actual SST analyses in the region:
- Record Length: Stage IV runs from 2002-present (this case study will use data through 2021). Estimation of extreme rainfall quantiles generally benefits from very long observational records. However, studies such as Wright et al. (2013) and Wright et al. (2017) have shown that 10-20 years can produce satisfactory estimates using SST. In one respect, a relatively short record can be considered a positive: since rainfall extremes have changed in recent decades, it is not clear that Atlas 14’s reliance on long records (e.g. from 1950 or earlier) yields results consistent with the current rainfall hydroclimate. In contrast, our analysis will (by necessity) focus on on the most recent 20 years that are available in Stage IV. These 20 years are presumably more representative of the current hydroclimate.
- Resolution: As explained in Appendix B, Stage IV is the highest-resolution precipitation dataset currently available that has been deemed suitable for usage in RainyDay. Unlike nClimGrid-Daily–which has a daily temporal resolution–Stage IV can be used to generate subdaily frequency estimates. Unlike NLDAS-2, which has roughly 8 km spatial resolution (or roughly 64 km2), Stage IV’s roughly 4 km resolution (or roughly 16 km2) can get us closer to the point-scale rainfall frequency statistics that are desirable for many infrastructure applications. (Though to be clear, RainyDay was not designed to produce point-scale statistics.)
- Accuracy: In most circumstances, Stage IV is quite accurate, thanks in large part to careful bias correction by the National Weather Service River Forecast Centers. There are some caveats however. One of these–underestimation of short duration (e.g. 1-hour) rainfall rates–is discussed at the end of this case study. The other caveat is that in this region, like many others, the radar data that forms the backbone of Stage IV exhibits some interesting “artifacts” due to things like beam blockage, bright band contamination, and ground clutter. While annoying, this doesn’t pose a big challenge to SST as long as the user understands how to identify and remove these things. This tutorial shows how to do this.
Madison .sst File ¶
The initial .sst file used in this case study (i.e. for storm catalog creation) is given here. Key choices within this file are explained below. Note that some of the fields described earlier in this guide are omitted from this .sst file since they are not relevant to this analysis.
#============================================================
# WELCOME TO THE RAINY DAY MODEL!
# USER DEFINED INFORMATION FOR STOCHASTIC STORM TRANSPOSITION
# MADISON WI CASE EXAMPLE CASE STUDY
#============================================================
MAINPATH /Users/daniel/Google_Drive/RainyDay2/RainyDayTutorials/Examples/Madison
SCENARIONAME RainyDay_MadisonExample
RAINPATH /Users/daniel/Documents/DATA/StageIV/NEWREPROJ/ST4.*.newregridded.nc
CATALOGNAME StageIV_Madison_ExampleCatalog.nc
CREATECATALOG true
DURATION 24
DURATIONCORRECTION true
NSTORMS 400
NYEARS 500
CALCTYPE PDS
NREALIZATIONS 100
UNCERTAINTY ensemble
TIMESEPARATION 12
DOMAINTYPE rectangular
LATITUDE_MIN 42.5
LATITUDE_MAX 44.0
LONGITUDE_MIN -90.5
LONGITUDE_MAX -88.0
DIAGNOSTICPLOTS true
FREQANALYSIS false
EXCLUDESTORMS none
EXCLUDEMONTHS 1,2,3,12
INCLUDEYEARS all
TRANSPOSITION uniform
RESAMPLING poisson
RETURNLEVELS 2,5,10,25,50,100,200,500
RETURNTHRESHOLD 1
POINTAREA rectangle
POINTLAT 43.075
POINTLON -89.385
BOX_YMIN 43.0
BOX_YMAX 43.25
BOX_XMIN -89.5
BOX_XMAX -89.25
Transposition Domain Selection ¶
Defining the transposition domain Aw is among the most important and, unfortunately, most subjective aspects of SST. The next case study will explore this issue further. Here, we will keep things simple. First, we will use relatively small transposition domain that runs from -90.5 to -88.0 degrees west longitude and 42.5 degrees to 44.0 degrees north latitude. This corresponds to an area of roughly 33,700 km2. While not a primary consideration in operational SST analyses, this does keep the runtime for storm catalog creation quite short (less than 10 minutes on a modern laptop).
In the long run, we hope to have a suite of tools to help identify and create domains over which rainfall extremes are homogenous and/or rainfall extremes can be “rescaled” in what is sometimes referred to as “enhanced SST” (see Nathan et al. 2016, Zhou et al. 2019, and Wright and Holman 2019 for recent work in enhanced SST and related concepts). Unfortunately, we aren’t there yet. For now, we can hopefully content ourselves with the the lack of substantial topographic relief and the relatively modest rainfall gradients (for example, in the 2-year 24-hour rainfall from Atlas 14 shown above). We’ll reconnect with the idea of transposition domain selection later in this tutorial, ultimately showing that results in this region are fairly insensitive to choice of domain.
Selection of Other Options ¶
Here, we step through the key fields in the .sst file which, in addition to selecting a suitable input dataset and transposition domain, are important elements of a successful RainyDay analysis. Refer back to earlier in this guide for detailed descriptions of these fields.
RAINPATH: If you downloaded the Stage IV data from the link provided in Section 7, you’ll find that the filenames look like this: ‘{some path}/ST4.20020704.newregridded.nc’ (note that in windows, I think the path will use ‘\’ rather than ‘/’). The string ‘20020704’ indicates that this is the file for 4 July 2002. As explained in Section 5, RAINPATH needs to be able to “find” this string using wildcards. Some users will know exactly what this means. If not, use this:
- RAINPATH {some path}/ST4.*.newregridded.nc
Basically that creates a list of files within the directory {some path} that have a filename starting with ‘ST4.’, ending in ‘.newregridded.nc’, and containing some string in between; in this case a string indicating the date.
CREATECATALOG: To get started, we need to create a storm catalog and so we’ll set this to ‘true’. We can then switch to ‘false’ once the storm catalog has been created and we’re ready to move on to refinement and frequency analysis. However, if you wished to modify the transposition domain, increase the number of storms in the catalog, or conduct analyis for a longer duration, you’d need to generate a new storm catalog.
DURATION: Since the main objective of this analysis is to produce 6-hour and 24-hour DDF estimates, we’ll choose 24 hours here. There are two pertinent comments here:
- As explained earlier, once a storm catalog is created, it can be reused in subsequent analyses as long as the transposition domain is unchanged and as long as the DURATION used in those later analyses is less than or equal to the duration of the storms in the storm catalog. So in other words,if we use ‘DURATION 24’ to create our storm catalog, we can then reuse that catalog to create both the 6-hour and 24-hour frequency analyses.
- Also as explained earlier, DURATIONCORRECTION can be a useful option, especially for analyses with relatively short durations such as 24 hours or less. We will use it here, and demonstrate later what happens if you don’t use it (and will also finally explain what underlying problem DURATIONCORRECTION is intended to fix). The upshot of this is that the storms in the storm catalog will actually be 72 hours in duration, while subsequent frequency analyses will be of whatever duration is specified.
DURATIONCORRECTION: See discussion related to DURATION. We’ll use this option by setting it to ‘true’, but turn it off (i.e. set it to ‘false’) later to see what happens.
NSTORMS: Let’s create a storm catalog with 400 storms. Since we’ll be using Stage IV from 2002-2021, that is 400 storms over 20 years, or an average of 20 storms per year. Note that if you omit NSTORMS from the .sst file, the default is also 20 per year, so you could in this case omit this field without changing the results.
NYEARS: Let’s use 500. This means that we can evaluate rainfall recurrence intervals from 1 up to 500 years.
NREALIZATIONS: Let’s use 100. This means that for each recurrence interval, 100 replications will be created so that we can look at uncertainty/variability. In other words, there will be 100 2-year storms, 100 10-year storms, 100 100-year storms, etc. UNCERTAINTY, which is omitted from the .sst file, will default to ‘ensemble’, meaning the results will display the range between the smallest and largest of these 100 replications for any given recurrence interval.
UNCERTAINTY: We’ll use ‘ensemble’ here, which is the default value if you were to omit this field from the .sst file. Results will display the range between the smallest and largest of these 100 replications for any given recurrence interval. The mean will also be calculated. You could instead use something like ’90’, which would calculate the 90% interquartile spread (also known as the 90% confidence interval) from the 100 replications.
TIMESEPARATION: Let’s use 12 hours. This means that none of the storms within the storm catalog will start or stop within 12 hours of any other storm in the catalog. Anything between, say 0 and 48 hours could be reasonable here. Longer TIMESEPARATION values ensure statistical independence of storm events, but at the expense of leaving out potentially important rainy periods.
DOMAINTYPE: As shown above, we’ll use a rectangular transposition domain. An example of an irregular domain will be shown later in the Big Thompson Case Study. The necessary fields are populated (‘LATITUDE_MIN’, ‘LATITUDE_MAX’, etc.).
RESAMPLING: We’ll start with ‘poisson’, but will later examine sensitivity to this choice.
TRANSPOSITION: We’ll start with ‘uniform’, but will later examine sensitivity to this choice.
CALCTYPE: We’ll start with ‘pds’ for a partial duration series-based analysis, but will demonstrate how results differ if we switch to ‘ams’. If your goal is to use RainyDay for rainfall frequency analysis and you are interested in relatively frequent recurrence intervals such as 2- to 20-year, using ‘pds’ is a good choice. It does run slower than ‘ams’, however.
EXCLUDEMONTHS: The accuracy of Stage IV (really, all gridded precipitation datasets) suffers in the wintertime, for a variety of reasons. Radar in particular struggles to differentiate cloud ice and falling snow from rainfall. In addition, within the study region, heavy rainstorms don’t really occur in the winter. For this reason, we’ll only run our analysis for April-November. The line ‘EXCLUDEMONTHS 1,2,3,12’ in the .sst file accomplishes this.
POINTAREA and related fields: Stage IV and other radar datasets have a nasty tendency for very high values appearing erroneously in individual grid cells. Those values would influence the storm catalog, skewing towards these erroneous values. Using a larger rectangle while creating the storm catalog help mitigate against this. For this reason, when creating a storm catalog intended for single grid cell-scale frequency analysis using a radar-based dataset like Stage IV, it is advisable to use ‘POINTAREA rectangle’ when create the storm catalog, and then switch to ‘POINTAREA grid’. So we’ll start with a rectangle (or square, if you prefer) that is 0.25 by 0.25 degrees centered on Madison, WI, and then switch to a single grid point that is located in downtown Madison once the storm catalog has been created and refined.
DIAGNOSTICPLOTS: To get started creating a storm catalog, we’ll set this to ‘true’, and then switch to ‘false’ once the storm catalog has been created and we have all the diagnostic plots already generated. Note that creating these plots can oftentimes take several minutes (it creates a lot!).
EXCLUDESTORMS: To start we can omit this field or use ‘none’. Once the storm catalog is created, we can examine the diagnostic plots to see which storms should be excluded. You’ll see how this works below.
FREQANALYSIS: We’ll set this to ‘false’ initially. First, we need to create a storm catalog and then screen it to identify storms that need to be removed from the frequency analysis via the EXCLUDESTORMS field. Once we’ve done that, we can set FREQANALYSIS to ‘true’.
SCENARIOS: We won’t be dealing with netCDF output scenarios in this example, so we can use ‘false’.
Part 2: Storm Catalog Creation and Analysis ¶
Generate Storm Catalog ¶
Once we have our .sst file ready, we’re ready to create a storm catalog. We can do this using the command:
conda activate RainyDay_Env
python {path}/RainyDay_Py3.py {path}/Madison_example.sst
Analyze and Refine Storm Catalog ¶
Congratulations, you’ve created a storm catalog! As mentioned above, however, there can be problems with precipitation data, particularly when using weather radar-based precipitation datasets. You should always look through the contents of your storm catalog. RainyDay makes this fairly easy to do with the diagnostic plots that it can create. Consider the figure below, which shows four plots that RainyDay has created for this example:
Figure: “Catalog-wide” diagnostic plots from Madison, WI case study.
- Top-Left: This plot shows the locations of all storms in the storm catalog (dots) and the spatial probability of storm occurrence (red shading). Present in this plot as well as the next two is a rectangle showing the area specified via the POINTAREA field (outlined in black), which in this case is a 0.25 by 0.25 degree box centered on Madison. This plot can be useful for assessing whether the probability of storm occurrence varies substantially over the selected transposition domain. In this example, storms appear to be more frequent in the southwestern part of the domain and less frequent in the northeast. You will also note lots of dots around the domain boundary. This looks strange but is normal-it just means that lots of storms only partially occurred within the boundary. For a storm that occurred near the boundary-as with all other storms-RainyDay will only transpose the rainfall that fell within the boundary. If you think the differences in storm frequency across the domain are important for frequency analysis, you can account for them later by using the TRANSPOSITION field. This will be shown later.
- Top-Right: This plot shows the locations of all storms in the storm catalog (dots), the mean rainfall from all storms in the storm catalog (blue shading), and the area defined under POINTAREA (in red). This, as well as the bottom-left plot, can be useful for assessing whether the magnitude of storm rainfall varies across the selected transposition domain. In this example, storms appear to be more intense in the southwestern part of the domain and less intense in the northeast. While there are ways of dealing with this issue that are coded into RainyDay, they are not well-tested nor well-supported by documentation and so are omitted here. A more brute force way of dealing with this is to change your transposition domain to these variations are less.
- Bottom-Left: This plot shows the locations of all storms in the storm catalog (dots), the standard deviation of the rainfall from all storms in the storm catalog (green shading), and the area defined under POINTAREA (in red). This can be useful for assessing whether the properties of storm rainfall varies across the selected transposition domain and whether there are any strange-looking locations. On that latter point, notice the strange dark green “blob” in the upper-left corner of the plot. We’ll revisit this blob below. As with the top-right plot of mean rainfall, this plot could be used to help decide whether you should change the transposition domain. For example, you could decide that you want a domain in which the mean rainfall (e.g. from the top-right plot) is approximately within +/-1 standard deviation of the rainfall within Aw.
- Bottom-Right: This plot shows the cumulative distribution function of storm total rainfall within your storm catalog. The largest single rainfall total of the size and shape of Aw is also highlighted, though, as we’ll see below, this value is highly suspect in this particular storm catalog.
In addition to these “catalog-level” diagnostics, RainyDay produces a storm total map and rainfall timeseries (i.e. hyetograph) for each storm in the catalog. The figure below shows examples of these for two storms from the catalog. These are examples of “good” storms (in terms of the realism of their rainfall properties), as opposed to the “bad” storms shown in the following figure, which highlight some rainfall errors that can pop up when using gridded precipitation data, particularly from weather radar.
Figure: Diagnostic rainfall maps and hyetographs for two major storms from Madison, WI case study.
The rainfall maps and hyetographs for the two storms shown above appear to be reasonable (meaning, they look like realistic rainfall patterns). Below, four storm total maps are shown that don’t look so realistic. Deep explanation of why these issues occur is beyond the scope of this tutorial, but there is a wealth of research studies on radar rainfall errors. What is important to note here is that a user, particularly when using a radar-based dataset such as Stage IV, needs to be able to recognize storms that show weird features such as these and then exclude them from subsequent analysis. This is more of an art than a science, and sometimes it can be difficult to decide. One thing to remember is that if a storm looks weird BUT the storm total rainfall is relatively low, it won’t influence your rainfall frequency results much regardless of whether or not you decide to exclude it. If the storm total rainfall is high, however, failing to exclude it from your frequency analysis can heavily (and unrealistically) influence your results. Each of the four plots are described in turn:
Figure: Unrealistic-looking storm total maps from diagnostic plotting.
- Top-Left: This one is fairly subtle, but there appears to be what’s known as a “blocked radar radial”, represented by the long thin line cutting across the domain. Note that because this storm total rainfall is fairly low (it is only storm number 33 out of 400), it likely wouldn’t make a big difference whether or not you exclude this one. Do not be alarmed if the background colors do not extend all the way to the boundary of the transposition domain. The technique is still working properly.
- Top-Right: This one is more obvious: there is a small, very high-intensity “bullseye” with more than 350 mm of rainfall. While in theory this is possible, it is suspiciously high for such a confined area, and April is not known for such extreme rainfall in the region (indeed, this would be near record-breaking rainfall for Wisconsin). This is likely instead an example of bright band contamination of the radar or some related issue. These types of problems in Stage IV are more prevalent in earlier time periods (e.g. this “storm” was in 2002) and during winter, spring, and fall.
- Bottom-Left: It isn’t totally clear what the issue is here, but you can see the Theissen Polygons that the river forecast centers used to bias correct this data. This doesn’t look pretty, but perhaps one could corroborate (or not) the existence of heavy rainfall in the region during that time period.
- Bottom-Right: Note the ring-like structure of very heavy rainfall in the upper-left corner. This is definitely nonphysical, likely related to bright-band contamination.
Note that three of the four storms pictured above are-if they are believed-quite major rainstorms. Failure to exclude them would definitely negatively impact the results.
If you go through the diagnostic plots from all 400 storms in the storm catalog (it doesn’t actually take all that long once you get the hang of it), you can create a list of storms that should be excluded from the frequency analysis. This should be provided to EXCLUDESTORMS. I came up with the following list, but this is a very subjective process and other users could certainly create a very different (possibly much longer) list:
EXCLUDESTORMS 23,25,26,46,68,70,84,85,120,160,188,215,297,327,373,390,400
Part 3: Rainfall Frequency Analysis ¶
In addition to the EXCLUDESTORMS list above, you need to make the following changes to be ready to run the frequency analysis:
CREATECATALOG false
DIAGNOSTICPLOTS false
FREQANALYSIS true
POINTAREA grid
POINTLAT 43.075
POINTLON -89.385
The change in POINTAREA reflects the fact that we want to generate frequency analysis results for a single grid cell. Now you’re ready to go to run the frequency analysis.
conda activate RainyDay_Env # or whatever your environment is called
python RainyDay2/Source/RainyDay_Py3.py Madison_example.sst
When RainyDay performs a rainfall frequency analysis, it produces two or three pieces of output: 1) a text file ending in ‘_FreqAnalysis.csv’; 2) an image ending in ‘_FreqAnalysis.png’. They both contain the same information.The output from the run above can be seen in the figure below. Both show the uncertainty (i.e. the prediction interval defined using the UNCERTAINTY field) and the mean calculated from NREALIZATIONS, for all of the return intervals defined by RETURNLEVELS:
Figure: 24-hour duration rainfall frequency curve based on the above run, in both comma-delimited text and graphical format.
Generating 6-hour and 24-hour rainfall frequency curves ¶
Since the storm catalog generated in the previous steps has a duration of 72 hours (again, note that because DURATIONCORRECTION was set to ‘true’, RainyDay created a storm catalog with a duration longer than the 24 hours specified for DURATION), it is possible to use it create rainfall frequency curves for durations between 1 hour (the temporal resolution of Stage IV) and 72 hours. All you need to do is change the DURATION field in the .sst file. The figure below, for example, shows 6-hour and 24-hour frequency curves from RainyDay. In addition, it shows the corresponding curves from NOAA Atlas 14. You will notice strong agreement between RainyDay and Atlas 14 for all but the highest recurrence intervals.
Figure: Left-Comparison of 6-hour rainfall frequency curves from NOAA’s Atlas 14 (red) and the above RainyDay run (blue). Right-Comparison of 24-hour rainfall frequency curves from NOAA’s Atlas 14 (red) and the above RainyDay run (blue).
This result deserves a few additional comments. First, it is worth emphasizing that in the section above on transposition domain selection, we put very little effort into identifying an appropriate transposition domain. Nonetheless, here we obtained results that compare very well with Atlas 14. This would seem to suggest that choice of transposition domain is not necessarily of paramount importance, at least in this geographic setting. Some further sensitivity analysis to choice of domain is provided later in this case study. On the other hand, we also need to recognize that this close correspondence between RainyDay and Atlas 14 does not necessarily tell us that we’ve arrived at the right answer. Some researchers, including the author, have argued that Atlas 14 underestimates current rainfall quantiles due to the impacts of climate change. In principle, RainyDay’s reliance on the relatively short, recent Stage IV rainfall record would seem to imply that we should obtain higher quantiles than Atlas 14 provides, assuming such nonstationarity exists. We did indeed find this to be the case in general in our Wisconsin Rainfall Project, which used RainyDay (among other methods) to estimate rainfall frequency over the entire state. That project used a number of complex options and methods in RainyDay, as well as some postprocessing. Those additional steps are well beyond the scope of this tutorial, but the results tended to result in somewhat higher quantile estimates than Atlas 14 for most durations and in most parts of the state.
Sensitivity to RainyDay Options ¶
Using the existing storm catalog, it is possible to quickly re-run RainyDay to assess sensitivity by changing various fields in the .sst file. We will highlight only a few such changes here. In addition, we’ll examine sensitivity to the selected transposition domain by testing out a larger domain. This latter analysis requires generating a new storm catalog for the larger domain and doing the quality control described above.
The figure below shows sensitivity in 6-hour frequency curves to four fields in the .sst file:
Figure: Sensitivity of 6-hour IDF curves for Madison, WI to different options within the .sst file. See below for explanations.
The top-left panel compares frequency curves generated using annual maxima (red curve; ‘CALCTYPE ams’ in the .sst file) vs. partial duration (blue curve;’CALCTYPE pds’ in the .sst file). As expected, differences are negligible for high recurrence intervals, and, at least in this example, relatively small for the 2-year to 5-year recurrence intervals.
The top-right panel of the figure above shows frequency curves generated using uniform tranposition (blue curve; ‘TRANSPOSITION uniform’ in the .sst file)–meaning, storms are transposed anywhere in the domain with equal probability–vs. nonuniform (red curve; ‘TRANSPOSITION nonuniform’ in the .sst file). The latter option means that storms are transposed with a nonuniform probability, based on where storms occured in the storm catalog. You’ll see basically no change for all but the highest recurrence intervals. For transposition domains within which the probability of occurrence varies more dramatically than this one–particularly likely when NSTORMS is small–you could potentially see larger differences.
The bottom-left panel shows frequency curves generated using poisson resampling (blue curve; ‘RESAMPLING poisson’ in the .sst file)–meaning, the number of storms per simulated year follows the Poisson distribution–vs. empirical (red curve; ‘RESAMPLING empirical’ in the .sst file). The latter option is described in more detail here. You see almost no difference here. This lack of sensitivity has been seen in other locations as well. You could potentially see larger differences for storm catalogs when NSTORMS is small.
Speaking of small NSTORMS, the lower-right plot shows precipitation frequency curves generated using ‘NSTORMS 400’ (blue curve; this is the number used elsewhere in this example) and ‘NSTORMS 20’ (red). Here you see some potentially important differences–underestimation of frequent events and overestimation of rare events due to the small storm catalog. This is problematic in the case of ‘NSTORMS 20’. Recall that the Stage IV data used here is 20 years long, so 20 storms over 20 years means an average of one per year–but some years would have none, while others would have more than one. RainyDay, however, requires at least one storm per simulated year. Practically, this means that RainyDay cannot accurately model the variability in number of storms per simulated year from such a small storm catalog (see here for a more detailed explanation). setting NSTORMS to be larger, say, 5 times (or more) the number of years of record will eliminate that problem.
Sensitivity to Transposition Domain ¶
A has been mentioned elsewhere in this user’s guide, a central element of SST is the selection of the transposition domain. Thus far, there has been sadly little guidance on best practices to do this. In this section, we demonstrate the sensitivity (or, as will be seen, lack thereof) to choice of transposition domain within this case study. The figure below shows the original domain used for the analyses above, as well as a much larger one that spans -92.5 to -88.0 degrees west longitude and 41.5 degrees to 45.0 degrees north latitude. This larger domain, at 142,500 km2, is more than four times larger than the original domain:
Figure: Red rectangle-original transposition domain used in this case study. Blue rectangle-enlarged domain used to demonstrate sensitivity of SST results to domain selection.
The figure below shows the resulting 6-hour (left panel) and 24-hour (right panel) rainfall frequency curves for Madison, WI developed using the smaller original transposition domain used elsewhere in this case study (blue curves) and the larger domain shown in the figure above. While there are some differences in the uncertainty estimates (i.e. the shaded areas) the differences in the mean estimates (the solid lines) are minimal. This implies that over these transposition domains, the rainfall patterns are relatively homogeneous, which translates to limited sensitivity of SST results to precise choice of domain.
Figure: Comparison of 6-hour (left panel) and 24-hour (right panel) rainfall frequency curves for Madison, WI using the original smaller domain (blue curves) and the much larger domain (red curves).
Part 4: Concluding Remarks on Madison, WI Case Study¶
The results of this case study suggest that SST-based analyses in RainyDay can produce reasonable precipitation estimates, and that these results can be relatively insensitive to specific analysis choices, provided well-chosen starting points. Here, we will highlight one example analysis that appears to cut against this otherwise-positive picture. Using the original RainyDay configuration above, we reuse the storm catalog to estimate 1-hour duration rainfall frequency and compare it against NOAA Atlas 14’s 1-hour estimates (red and blue curves respectively):
Figure: Blue curve: 1-hour frequency estimates from RainyDay for Madison, WI. Red curve: NOAA Atlas 14 1-hour frequency estimates for the same location.
RainyDay substantially underestimates Atlas 14-by roughly 35-40%, irrespective of recurrence interval. There are several possible explanations for this:
- Conditional Bias: Conditional bias refers to radars’ difficulty with measuring very high rainfall rates. It is most pronounced for short durations, and thus is more likely to be present in the 1-hour duration analyses than in earlier 6-hour or 24-hour results. This issue is difficult to resolve, though Wright et al. (2013) does present conditional bias correction in the SST context.
- Point-area “representativeness” issues: Rain gage measurements such as those from Atlas 14 represent sampling areas of roughly 0.1 m2, whereas a single grid cell from Stage IV is roughly 20 km2, which is 108 times larger. Intuitively (and backed up by scientific study) there is little reason to expect rainfall statistics to be similar at those two scales, particularly at short durations.
- “Clock time” issues: Stage IV rainfall measurements are provided on a regular hourly time step, e.g. the total rainfall between 2pm to 3pm. Atlas 14, however, can and does use finer-resolution observations in some locations and then identifies the peak 60-minute periods from those finer records. So, for example, the peak rainfall could have instead been during the hour-long period from 2\:47-3\:47 pm. While few studies have focused on this discrepancy for short duration rainfall, it could explain some of the differences.
- Difference between analysis duration and storm catalog duration: It is conceivable that a different storm catalog is needed for this case–specifically, one focused on shorter-duration storms. In reality, this would not likely change the matter much. Since this catalog has 400 storms (a bit fewer actually, since some were excluded via EXCLUDESTORMS for various reasons), it is likely the case that all substantial rainfall events–including those that were severe at short (e.g. hourly) timescales are included within it.
Case Study 2: Watershed-Scale Rainfall Frequency Analysis for Big Thompson Watershed, Colorado¶
In this case study, we will create watershed-scale 6-hour and 72-hour rainfall frequency curves in complex terrain. Specifically, we will focus on the 360 km2 Big Thompson watershed above Estes Park in the front range of the Rocky Moutains in Colorado. The watershed is mostly within Rocky Moutain National Park, and is highly recommended as a tourist destination! This example will draw heavily from the analyses of Yu et al. (2021), who examined rainfall and flood frequency using RainyDay for the watershed upstream of the USGS stream gage 06733000.
One of the goals of this example and of Yu et al. (2021) is to highlight the important capability of SST to produce “spatial” precipitation frequency estimates, as well as frequency estimates for rare storms. Yu et al. (2021), for example, examined rainfall and flood frequencies at the 360 km2 scale out to 10,000 years, using only 36 years of gridded input precipitation data. That paper compared SST results with conventional methods based on much longer rain gage records. Those more conventional methods, however, cannot directly provide watershed-scale frequency estimates. The ability of RainyDay/SST to provide such spatial frequency estimates is particularly important for dam safety assessment and other applications which require consideration of relatively large upstream catchment areas, over which rainfall properties can vary substantially in space and time in ways that “point scale” rain gage estimates and frequency analyses based upon them fail to capture. This is the other side of the same coin discussed in the Madison, Wisconsin Case Study above. There, we pointed out that RainyDay/SST could not directly provide point-scale rainfall frequency estimates comparable to those from standard sources such as NOAA Atlas 14. Conversely, those standard sources and methods cannot provide watershed scale estimates.
As has been mentioned elsewhere in this document, the choice of transposition domain can be very important, particularly in regions with substantial topographic or other types of geographic variability. That is certainly the case in the front range of the Rocky Mountains, where terrain-locked thunderstorms and other flavors of orographic rainfall enhancement can be considerable. While we don’t present a universal solution for defining transposition domains, we will highlight some possible methods that could be used.
A ‘.sst’ file, storm catalog, and a small subset of the results presented below are available here. A brief description and download link for the NLDAS-2 precipitation dataset used in this study can be found here.
Part 1: Setting Up RainyDay ¶
Location Description ¶
The following description is adapted from Yu et al. (2021). The Big Thompson River above Lake Estes and Olympus Dam (henceforth referred to as Big Thompson River; see blue outline in the figure below), is situated in north-central Colorado. The land surface elevation of the watershed ranges from approximately 4,000 m above sea level (masl) in the west down to 2,300 masl at the stream gage (see Fig. 1 in Yu et al., 2021). The watershed is characterized by intermittent snow-cover with peak snowpack accumulation during March and April and ablation from May to early July. According to the USGS 2016 National Land Cover Dataset, the watershed is 53% forested, 23% shrubland, 14% grassland, and 10% other land cover types. We will use an approximately 16,200 km2 transposition domain that stretches along the Colorado Front Range and encompasses the watershed (red outline in the figure below; roughly 45 times larger than the watershed). You will note the mountainous terrain, beginning with the front range and extending west, contrasting with the relatively flat Great Plains to the east:
Figure: Google Earth screenshot showing the Big Thompson watershed (blue outline), the transposition domain from Yu et al. (2021; red outline), and the Atlas 14 2-year 24-hour rainfall depth. The Atlas 14 rainfall ends at the Wyoming and New Mexico borders since those states are part of separate Atlas 14 volumes.
The transposition domain shown in the figure above is a subregion of an earlier domain that was identified by US Bureau of Reclamation staff using rain gage measurements from the Global Historical Climatology Network. Stations were identified that shared similar values of L-kurtosis, L-skewness, the ratio of L-kurtosis to L-skewness, and discordancy (Hosking and Wallis, 1993, 1997). Yu et al. (2021) narrowed that larger domain to include areas north and south of the Big Thompson River within the Colorado Front Range and also limited the eastern extension of the domain to avoid subsequent transposition of storms from substantially different elevations.
Rainfall Data Selection ¶
This case study makes use of the NLDAS-2 precipitation dataset. While there could be compelling reasons to use the other datasets described in Appendix D, there are a few good things and a few not so good things that, on the whole, make NLDAS-2 useful for analyses in this region and similar ones:
- Record Length: Any application that demands estimation of low-recurrence precipitation frequencies will benefit from long rainfall records. Compared with many other moderate- and high-resolution gridded precipitation datasets, NLDAS-2, which begins in 1979 offers a relatively long record. Since Yu et al. (2021)’s focus was on recurrence intervals >1,000 years for dam safety analysis, this long record was an important consideration.
- Accuracy-The Good News: Precipitation datasets that rely heavily or entirely on weather radar or satellite measurements tend to have poor accuracy in moutainous regions. While the same can be said for rain gage-based datasets like NLDAS-2, the effect is generally somewhat less severe. This is in part because NLDAS-2 uses a topography-based adjustment that attempts to account for orographic differences in precipitation (Daly et al. 1994; Daly et al. 2008).
- Accuracy:The Bad News NLDAS-2 has a roughly 12 km spatial resolution. Put another way, each grid cell covers roughly 144 km2, which means the Big Thompson watershed could be covered by ~2.5 such grid cells. That doesn’t allow for much depiction of spatial heterogeneity in rainfall within the scale of the watershed. This is somewhat problematic for precipitation frequency analysis, and is more of an issue for flood frequency, where those heterogeneities can be quite important. In other words, a higher-resolution dataset would be preferable to NLDAS-2, all else being equal.
Big Thompson .sst File ¶
The initial .sst file used in this case study (i.e. for storm catalog creation) is given here. Key choices within this file are explained below. Note that some fields are omitted from this .sst file since they are not needed for this analysis.
#============================================================
# WELCOME TO THE RAINY DAY MODEL!
# USER DEFINED INFORMATION FOR STOCHASTIC STORM TRANSPOSITION
# BIG THOMPSON, CO CASE EXAMPLE CASE STUDY
#============================================================
MAINPATH Examples/BigThompson
SCENARIONAME RainyDay_BigThompsonExample_72hrs
RAINPATH NLDAS2_RainyDay/nldas.*.precip.nc
CATALOGNAME StageIV_BigThompson_ExampleCatalog.nc
CREATECATALOG true
DURATION 72
DURATIONCORRECTION true
NSTORMS 400
NYEARS 10000
CALCTYPE PDS
NREALIZATIONS 100
TIMESEPARATION 12
DOMAINTYPE irregular
DOMAINSHP Examples/BigThompson/GIS_Files/Domains/SST_Domain2.shp
DIAGNOSTICPLOTS true
FREQANALYSIS true
EXCLUDESTORMS none
EXCLUDEMONTHS none
INCLUDEYEARS all
TRANSPOSITION uniform
RESAMPLING poisson
RETURNLEVELS 2,5,10,25,50,100,250,500,1000,2500,5000,10000
POINTAREA watershed
WATERSHEDSHP Examples/BigThompson/GIS_Files/BigThompson_WatershedBoundary.shp
Transposition Domain Selection ¶
The following is excerpted with minor editing from Yu et al. (2021):
When using SST, like other regional rainfall analysis methods (Svensson and Jones, 2010), one must be cognizant of orographic influences on heavy rainfall. Jarrett (1989) argued that transposing storms between different elevations in the region is not defensible, showing that six-hour heavy rainfall abruptly decreases from over 200 mm in the Eastern Plains of Colorado (roughly 1,500 masl) to less than 50 mm in the mountains (above 2,500 masl). On the other hand, recent studies (Mahoney et al., 2015; NOAA ESRL, 2018; Rossi et al., 2020) have shown that orographic influences on heavy precipitation in the Colorado Front Range are relatively modest. Furthermore, Mahoney et al. (2015) also indicated a relatively uniform seasonality of extreme precipitation east of the Continental Divide, where most of the extreme events occur in the spring (March-May) and summer (June-August). Thus, when delineating the transposition domain in Yu et al. (2021), we considered the elevation and spatial distribution of precipitation magnitudes and seasonality.
This domain was validated using linear regression and L-moments homogeneity measures (Hosking and Wallis, 1997). 72-h precipitation annual maximum depths for the 151 NLDAS-2 grids within the transposition domain show a very limited, albeit statistically significant, relationship with elevation. Heavy precipitation events in excess of 200 mm have occurred at both low (1,500 masl) and high (3,500 masl) elevations. L-moments Discordancy—the consistency between annual maximum 72-hour depths for one NLDAS-2 grid with all other grids in a region—was also calculated (Hosking and Wallis, 1997). Discordancy aims to identify sites (in this study, rainfall grid cells) that are statistically different from regional averages. Only five of 151 NLDAS grids have discordancy values larger than 3.0, which is a rule-of-thumb for excluding a site from a homogenous region (e.g., Bonnin et al., 2006; Perica et al., 2013). In addition, the L-moments-based homogeneity test statistic H1=1.44 (Hosking and Wallis, 1997) for the transposition domain meets the suggested threshold of homogeneity (H1<2.0). Thus, we conclude that the transposition domain is relatively homogeneous, at least in terms of 72-hour grid-scale NLDAS-2 precipitation, and thus that transposing storms within this domain is valid.
This demonstrates that existing methods for assessing the homogeneity of extreme precipitation over regions can be applied in the context of SST, even in relatively complex terrain. We aim to do further testing of this capability, potentially creating preprocessing tools for defining transposition domains using existing or new homogeneity measures.
Selection of Other Options ¶
Here, we step through the key fields in the .sst file which, in addition to selecting a suitable input dataset and transposition domain, are important elements of a successful RainyDay analysis.
RAINPATH: As mentioned above, this case study will use the NLDAS-2 precipitation dataset. If you downloaded the NLDAS-2 data from the link provided in Appendix B you’ll find that the filenames look like this: ‘{some path}/nldas.20020704.precip.nc’ (note that in windows, I think the path will use ‘\’ rather than ‘/’). Consult the Madison case study above for more details on how these file paths works and specifically on the usage of ‘wildcards’ (e.g. ‘*’) when defining RAINPATH.
DURATION: Since the main objective of this analysis is to produce 6-hour and 72-hour frequency estimates, we’ll choose 72 hours here. There are two pertinent comments here. Once a storm catalog is created, it can be reused in subsequent analyses as long as the transposition domain is unchanged and as long as the DURATION used in those later analyses is less than or equal to the duration of the storms in the storm catalog. So in other words,if we use ‘DURATION 72’ to create our storm catalog, we can then reuse that catalog to create both the 6-hour rainfall frequency estimates.
DURATIONCORRECTION 72 hours is pretty long, which means that DURATIONCORRECTION is probably not needed for the 72-hour frequency estimation. We will turn it on when we do the 6-hour analyses, however.
NSTORMS: Let’s create a storm catalog with 400 storms. Since we’ll be using NLDAS from 1979-2021, that is 400 storms / 43 years or an average of 9.3 storms per year.
NYEARS: Let’s use 10,000. This means that we can evaluate rainfall recurrence intervals from 1 up to 10,000 years. Note that transposing this many years of rainfall can take a while, particularly when the analysis is for a watershed, rather than a single grid cell.
NREALIZATIONS: Let’s use 100. This means that for each recurrence interval, 100 replications will be created so that we can look at uncertainty/variability. In other words, there will be 100 2-year storms, 100 10-year storms, 100 100-year storms, etc. UNCERTAINTY, which is omitted from the .sst file, will default to ‘ensemble’, meaning the results will display the range between the smallest and largest of these 100 replications for any given recurrence interval.
TIMESEPARATION: Let’s use 12 hours. This means that none of the storms within the storm catalog will start or stop within 12 hours of any other storm in the catalog. Anything between, say 0 and 48 hours or more could be reasonable here. Longer TIMESEPARATION values ensure statistical independence of storm events, but at the expense of leaving out potentially important rainy periods.
DOMAINTYPE: We’ll use the transposition domain shown in the figure above and Fig. 1 of Yu et al. (2021), which means this needs to be set to ‘irregular’. Unlike the Madison, Wisconsin case study above, this is an irregularly-shaped domain which requires a polygon in ERSI shapefile format in the WGS84 geographic projection. More specifics on the formatting of this shapefile, including instructions for how to create one from scratch without any prior GIS know-how, can be found in Appendix C.
DOMAINSHP: Because we’ll be using an irregular domain, we need to provide the path to the polygon shapefile: e.g. Examples/BigThompson/GIS_Files/Domains/SST_Domain2.shp
RESAMPLING: We’ll go with ‘poisson’.
TRANSPOSITION: We’ll go with ‘uniform’.
CALCTYPE: We’ll go with ‘ams’ for a partial duration series-based analysis. This could lead to some underestimation for more frequent recurrence intervals, but should produce basically identical results for rarer events. Given that the latter is our focus here, and that ‘ams’ runs faster than ‘pds’, that seems like a good choice.
EXCLUDEMONTHS: we’ll go with ‘none’ here, meaning we’ll include all months.
POINTAREA and related fields: We’ll stick with ‘watershed’ for POINTAREA. that means you then need to provide a polygon shapefile of the watershed boundary in the WGS84 geographic projection to the WATERSHEDSHP field. How to create such a watershed boundary shapefile, or where to find one, is beyond the scope of this tutorial. Plenty of information can be found on how to use standard GIS software such as QGIS (see here) or ArcGIS. Also, in many locations in the United States, watershed files can be delineated “on the fly” via the US Geological Survey’s StreamStats tool.
CREATECATALOG: To get started creating a storm catalog, we’ll set this to ‘true’, and then switch to ‘false’ once the storm catalog has been created and we’re ready to move on to frequency analysis.
DIAGNOSTICPLOTS: To get started creating a storm catalog, we’ll set this to ‘true’, and then switch to ‘false’ once the storm catalog has been created and we have all the diagnostic plots already generated.
FREQANALYSIS: In this case, it is probably ok to set it to ‘true’ right away. The reason is that NLDAS-2 doesn’t have the same problematic “artifacts” that datasets more reliant on weather radar like Stage IV often have.
EXCLUDESTORMS: To start we can omit this field or use ‘none’. Once the storm catalog is created, we can examine the diagnostic plots to see which storms should be excluded. However, in this case, it won’t likely be necessary, since we won’t likely see any strange-looking storms in NLDAS-2.
SCENARIOS: We won’t be dealing with scenarios in this example, so we can use ‘false’.
Part 2: Storm Catalog Creation and Analysis ¶
Generate Storm Catalog ¶
Once we have our .sst file ready, we’re ready to create a storm catalog. We can do this using the command (note that the ‘init’ command is only needed for iPython Notebooks and can otherwise be omitted. Note, this will take a while, not because the storm catalog creation is slow (it is pretty fast for NLDAS-2, since the files are small), but because we’re doing a really large number of tranpositions (specifically, NLREALIZATIONS NYEARS = 100 10,000 = 106).
conda activate RainyDay_Env # or whatever your RainyDay Anaconda environment is called
python {path}/RainyDay_Py3.py {path}/BigThompsonExample.sst # adjust paths as needed
Analyze and Refine Storm Catalog ¶
That took almost 40 minutes to run on a pretty fast laptop, again, mainly because the storm transposition phase takes a long time for such a large number of years (10 times as many as in the Madison, WI case study). However, if you consult the diagnostic plots, you’ll struggle to find any weird looking storms, since NLDAS-2 doesn’t suffer from the same problems with radar artifacts as the Stage IV and other radar-based datasets can. So in other words, you get to skip this part! Because there is nothing very interesting to show here, the diagnostic plots are omitted.
Part 3: Rainfall Frequency Analysis ¶
Generating 6-hour and 72-hour rainfall frequency curves ¶
Since the storm catalog generated in the previous steps has a duration of 72 hours, it is possible to use it create rainfall frequency curves for durations between 1 hour (the temporal resolution of NLDAS-2) and 72 hours. All you need to do is change the DURATION field in the .sst file. The figure below, for example, shows 6-hour and 72-hour frequency curves from RainyDay. In addition, it shows the corresponding curves from the US Bureau of Reclamation (72-hour only; reference not publicly available) and from the Colorado-New Mexico Regional Extreme Precipitation Study (REPS).
Figure: Left-Comparison of 6-hour rainfall frequency curves from the above RainyDay run (blue) and REPS (red). Right-Comparison of 72-hour rainfall frequency curves from the above RainyDay run (blue), REPS (red), and a prior US Bureau of Reclamation (USBR) regional frequency study (green). USBR results are based on the 4-parameter Kappa distribution and only available for the 72-hour duration. Both REPS and RainyDay results are at the scale of Big Thompson, while USBR results have been rescaled to the 360 km2 according to an area reduction factor from Technical Paper 29 (U.S. Weather Bureau, 1958).
At the 72-hour scale, there is generally close agreement between all three sources for return periods less than 50 years, with substantial deviations beyond that. At the 6-hour scale, deviations are relatively higher for all return periods. The reasons for such deviations are difficult to know and are beyond the scope of this study, but generally speaking, multiple estimates using different data and methods provide useful perspectives on uncertainties and options for infrastructure design and risk assessment (see Wright et al., 2021 for a more detailed version of this argument).
One feature that you’ll notice with the RainyDay results in the figure above is the tendency to “plateau” around roughly the 2,000- to 5,000-year return period. RainyDay has an absolute upper limit to the rainfall total that can be obtained, regardless of return period. This total is the one that would result from the specific transposition of the largest storm from the storm catalog-with respect to the size and shape of the area specified in POINTAREA-directly over the watershed (or other specified area). How many years of synthetic annual maxima are needed to reach this “perfect storm” will depend on a number of RainyDay parameters, chiefly the length of the input dataset, the size of the transposition domain, the number of storms in the storm catalog. This upper limit is not physically realistic, since it is clearly possible that a storm could exceed the largest ever observed. However, the general tendency towards a “plateau” in the frequency curve may not be so unrealistic. Certainly rainfall amounts cannot be infitely large; there are multiple thermodynamic mechanisms that serve to limit rainfall amounts or rates. Wright et al. (2020) discusses this issue in some more detail.
Sensitivity to RainyDay Options ¶
Here, we evaluate sensitivity to two key choices in the RainyDay analysis process. The next figure compares the same NLDAS-2-based RainyDay-based precipitation frequency curves shown in the figure above to curves obtained using Stage IV precipitation (with otherwise identical SST parameters):
Figure: Left-Comparison of 6-hour rainfall frequency curves from the above RainyDay run using NLDAS-2 (blue) and RainyDay using Stage IV (red). Right-The same as left, but for the 72-hour duration.
Differences are fairly substantial at both the 6-hour and 72-hour scales, with Stage IV underestimating at nearly all return periods. This is due to Stage IV’s severe tendency to underestimate extreme precipitation in mountainous regions due to several factors beyond the scope of this tutorial. This clearly demonstrates the inadequacy of Stage IV in such settings.
Next, we’ll compare how results change when we use a larger domain. In this case, that larger domain is actually the onederived by USBR personnel via L moments homogeneity methods to create the USBR precipitation frequency curve shown in the right panel of the precipitation frequency curves shown above. The larger domain is shown in green here:
Figure: The transposition domains used in Yu et al. (2021) and throughout this tutorial (red) and the larger version of that domain originally derived by the USBR (green).
The next figure compares the original NLDAS-2-based RainyDay-based precipitation frequency curves shown above to curves obtained using storms from NLDAS-2 but drawn from the larger domain shown in the previous figure:
Figure: Left-Comparison of 6-hour rainfall frequency curves from the above RainyDay run (blue) and RainyDay using NLDAS-2 with the larger domain shownin the previous figure. Right-Same as left, but for the 72-hour duration.
Here, you can see relatively minor differences. This suggests that both domains are equally suitable, despite one being nearly twice the area of the other. If one were to use a domain that differed more substantially, however, such as one that extended substantially farther west, one could expect larger differences.
Part 4: Concluding Remarks on Big Thompson, CO Case Study¶
This case study of low-frequency watershed scale precipitation estimation highlights several key aspects of SST using RainyDay which add to the first case study. Specific aspects are:
- Rare event estimation from “short” records: This example highlights that SST can be used to estimate quite rare return intervals (e.g. 1,000-10,000 years) from relatively short gridded precipitation records-in this case, 43 years of NLDAS-2 precipitation. Although the frequency results differ somewhat from those estimated using other sources, they are broadly consistent and appear to add a useful “second opinion” to assessments where estimates using other estimates already exist or are desired, or as an easily obtainable primary estimate when other estimates do not exist. One aspect of RainyDay-based SST frequency analyses that is visible in this case study is the tendency for them to “plateau” (i.e. leveling off) at very high return periods (in this case, above 2,000 years). This is an artifact of the RainyDay procedure (see Wright et al. 2020 for more discussion), but, as mentioned above, there is reason to believe that “real” precipitation frequency (as opposed to what we are able to estimate from our limited observations) may also show that plateau behavior, albeit not necessarily at the same precipitation depths or return intervals.
- Proper selection of an tranposition domain is an important step in SST analyses, particularly in regions such as that examined here where rainfall varies dramatically with elevation and other geographic features. In this case, we turn to a transposition domain that was originally identified using well-developed L moments-based homogeneity measures together with long-term rain gage records and then further refined by Yu et al. (2021) using the same measure applied to gridded NLDAS-2 precipitation. That there is broad consistency in the results generated using multiple methods and similar domains (e.g. the RainyDay results shown above) sugggests that the existing L moments framework could be a fruitful way forward for identifying homogeneous domains.
- Also shown above) is the basic fact that dataset selection matters. Specifically, in this mountainous region, the radar-based Stage IV dataset exhibits severe underestimation compared with the primarily gage-based NLDAS-2. On the other hand, NLDAS-2’s relatively coarse spatial resolution is a potential downside for flood simulations. There are always tradeoffs involved in dataset selection in RainyDay, and there is a continued need for the development of high-resolution, high-accuracy,long-term gridded precipitation datasets. In the absence of user knowledge of which dataset might be appropriate for a particular analysis/location, the brief descriptions of available datasets in Appendix B can help. It can also be wise to try different datasets in RainyDay and see how they compare. Finally, you can consider contacting the author for suggestions.
Appendix A: A General Description of SST ¶
The most general explanation of Stochastic Storm Transposition (SST) can be found in Wright et al. (2020). That explanation is reproduced here:
The Basics ¶
SST includes the following key elements: defining a transposition domain; developing an extreme storm catalog; randomly transposing storms in a region over a watershed; and estimating rainfall or flood probabilities. The concept, shown schematically in the figure below, can be briefly summarized: observed storms are transposed at random within a transposition domain of area AD in such a way that new unobserved realizations of extreme rainfall over the domain are produced. In doing so, new realizations of extreme rainfall are created for a watershed of area Aw that resides within this domain. The space-time structure of rainstorms, including intensities, areas, and movement is preserved.
Figure: Schematic of SST procedure for a single storm. Specific rainfall isohyets for the observed and transposed storms are shown for four time periods. Entire rainfall fields, as opposed to isohyets, can be transposed. All rainfall fields or isohyets are transposed by a north-south distance Δy and east-west distance Δx, which are randomly selected from distributions of DY(y) and DX(x), respectively. In some SST efforts, DY(y) and DX(x) have been assumed to be uniform; in other cases, they have been estimated based on the locations of historical storms. Figure adapted from Wright et al. (2013).
Defining a Meteorologically Homogeneous Transposition Domain ¶
Storm transposition is only defensible insofar as the storm could have occurred at that location with some nonzero probability. Chow (1964) defined a “homogeneous region,” also referred to here and throughout SST literature as a “transposition domain,” as “the area surrounding the given river basin in which storm-producing factors are substantially comparable; i.e., the general area within which meteorological influences and topography are sufficiently alike.” The transposition domain shown above is square and centered around the watershed, but this need not be the case—see, for example, the figure below which shows an example transposition domain for New Orleans, Louisiana, together with transposed versions of rainfall from Hurricane Harvey from September 2017. Gupta (1972) argued that a transposition domain could “include a very large geographic area in the eastern half of the United States where (topographic) relief is generally moderate and it may include relatively small areas in the western United States where extreme topography is encountered.” The issue of regional homogeneity is not unique to SST: regional rainfall frequency analysis and regional flood frequency analysis must also wrestle with it (e.g. Hosking and Wallis, 1993).
Figure: (a) Transposition domain for a region surrounding New Orleans, Louisiana along the southern United States Gulf Coast. (b) Peak 72-hour rainfall map for Hurricane Harvey in August 2017, based on Stage IV gage-corrected radar rainfall data (Lin, 2011). (c) to (e) three possible random transpositions of Hurricane Harvey rainfall which produce little, no, and extreme rainfall over New Orleans, respectively. (f) Example 72-hour IDF curve for New Orleans generated using the RainyDay software; shaded area portrays the spread of 100 distinct realizations, each consisting of 1,000 annual rainfall maxima.
For SST to be of value, the transposition domain must be sufficiently large that it includes multiple observed extreme rainstorms. If the domain is very large relative to the size of the watershed of interest, however, the probability of transposing one of these rainstorms over the watershed is small.
Creating a Storm Catalog from Spatial Rainfall Observations ¶
SST considers multiple storms that have occurred within the transposition domain for the watershed of interest. This set of storms is henceforth referred to as a storm catalog. Lacking any other alternative, early SST studies used rainfall observations from rain gage networks, often expressed as depth-area-duration (DAD) curves or tables that depict rainfall depth or rate as a function of averaging area and duration. By using DAD information along with assumptions of storm geometry, the labor of drawing or digitizing paper-based rainfall maps and then transposing them can be avoided. A useful source of both DAD tables and rainfall maps has been USACE (1973), which includes information on nearly 600 major U.S. storms starting in the 1880s. Though impressive in length and level of detail, this volume nonetheless has shortcomings—it has relatively fewer storms in the western U.S., and the evolution of rain gages over that time period means that the record does not provide a consistent picture of major storm activity even in the eastern part of the country. Such inconsistencies pose potential problems for SST (Foufoula-Georgiou, 1989; National Research Council, 1988), since the resampling described below implicitly assumes that the storm catalog reflects the “true” extreme rainfall hydroclimate within the transposition domain.
Advances in rainfall remote sensing using ground-based radar and satellites offer alternative data sources for storm catalog creation. An example of the spatially detailed depiction of regional rainfall provided by weather radar is shown in the previou figure for Hurricane Harvey, which struck the southeastern Texas coast in August 2017. Wright et al. (2020) discusses some potential strengths of these new data sources for SST.
Storm Resampling and Transposition ¶
The main objective of SST, to estimate rainfall or flood annual exceedance probabilities (AEPs), is achieved by resampling from a storm catalog to generate large numbers of realizations of extreme rainstorms over the watershed of interest. These realizations should synthesize new realistic annual patterns of rainstorms over the transposition domain and, by extension, over the watershed. To do this, a random number of storms k is generated by modeling the annual “arrival process” of storms over the domain.
The storm arrival process is usually assumed to follow either a Bernoulli or Poisson distribution with an arrival rate m/n storms per year, where m is the number of storms in the storm catalog and n is the length of the record (in years) from which the catalog was generated. Note that if a Bernoulli distribution is used, $m<n$ and k can only take on values of 0 or 1. This makes the Bernoulli arrival process suitable only if small values of m are used. As a consequence, the probability of transposing a storm over Aw will be relatively low and many realizations of transposed rainfall will thus be small or zero. This can be seen in panels c-d of the figure above, in which little and no rainfall is produced for New Orleans, Louisiana for two possible transpositions of Hurricane Harvey. Panel e, meanwhile, shows a transposition that produces extreme rainfall over New Orleans. This feature makes the Bernoulli arrival model suitable only for estimation of very low AEPs such as those needed for spillway design; the magnitude of more common events will be greatly underestimated. This restriction is lifted if a Poisson distribution is used. Underestimation of the magnitude of more common events can nonetheless result if m is not sufficiently large. No SST study to date has sought to model the temporal sequencing of these k storms, meaning that, to use the parlance of rainfall-runoff modeling, SST has thus far only been event-based.
The k storms are randomly sampled from the storm catalog and randomly transposed within the transposition domain. Three possible transpositions of 72-hour rainfall from Hurricane Harvey are shown in panels c-e of the figure above. All rainfall fields or isohyets that constitute a storm are transposed by a north-south distance Δy and an east-west distance Δx which are randomly selected from distributions $D_{Y}(y)$ and $D_{x}(x)$, respectively. These distributions jointly describe the spatial probability of storm occurrence.
The notion of a homogeneous transposition domain implies that the probabilities of random placement of transposed storms should be equal throughout the transposition domain, i.e. that DY(y) and DX(x) are uniform. Relaxation of this stricture allows larger values of AD and thus m, but could introduce bias since rainfall properties would not be strictly homogeneous. This issue was discussed in very general terms by Alexander (1963), who wrote that “the basic problem is to preserve only the essential statistical features of the area by discarding those which may be ascribed to sampling errors.” Several more recent schemes have provided ways to either modify the storm transposition probability or storm magnitude to limit these biases (Nathan et al., 2016; Wilson and Foufoula‐Georgiou, 1990; Wright et al., 2017; Zhou et al., 2019).
Once transposed, the k resulting rainfall amounts over the watershed are computed. The largest of these can be understood as a “synthetic annual rainfall maxima,” which form the basis of AEP estimation.
Estimating Rainfall and Flood Quantiles ¶
Unlike statistical FFA and RFA, it is not necessary to fit distributions to the synthetic annual maxima in order to obtain AEPs. Rather, AEPs can be estimated directly from the ranked synthetic annual maxima using plotting position formulae. For example, 1,000 annual maxima generated through SST would facilitate direct estimation of AEPs as low as 10-3. Uncertainty bounds can be obtained by generating multiple such “sets” of realizations. Panel f of the figure above, for example, shows an SST-based IDF curve for New Orleans, in which 100 sets of 1,000 annual maxima each were generated. The shaded area denotes the spread among these 100 sets.
Obtaining flood quantiles requires the use of a rainfall-runoff model, but flood AEPs are otherwise computed in the same manner. There need not be a 1:1 correspondence between rainfall AEP and flood AEP if rainfall spatiotemporal structure is considered or if watershed initial conditions such as soil moisture are treated as random variables (Franchini et al., 1996; Gupta, 1972; Wright et al., 2017; Yu et al., 2019).
Appendix B: Precipitation Data Sources ¶
While RainyDay uses the NetCDF4 format for its input and output files, it requires that these files follow a very specific file structure. While in principle an experienced user could format a new precipitation dataset so that it is readable by RainyDay, this is not necessarily easy to do. For that reason, a number of pre-formatted input datasets have been generated and made available for download. Those datasets, as well as their strengths and weaknesses, are described here.
An unfortunate reality is that none of the datasets listed here are likely suitable for modeling rainfall and flood frequency in very small watersheds, particularly in urban areas, where resolutions on the order of 1 km, 5-15 minutes are preferable. For now, no dataset exists at those higher resolutions that the RainyDay developers consider useful for SST applications. The possible exception could be the relatively short (11 year) and not up-to-date (2001-2011) NEXRAD Reanalysis.
NOAA Stage IV ¶
NOAA Stage IV is a multi-sensor precipitation dataset that covers the continental United States and extends slightly beyond its borders over the ocean and into Canada and Mexico. Stage IV is a “mosaic” of estimates from the National Weather Service River Forecast Centers. While the exact data and methods used in its creation vary with place and location, it is primarily gage-corrected weather radar. It has relatively good time coverage (beginning in 2002) as well as high temporal (hourly) and spatial (roughly 4 km grid) resolution. It is generally accurate east of the Rocky Mountains, and much less so farther west where sparse radar installations and radar beam blockage due to terrain can cause serious problems–typically underestimation. In some mountainous locations, no data are available for certain periods. In addition, Stage IV can exhibit “strange” results for certain times and locations, mainly for reasons related to the difficulty of measuring rainfall using radar. Examples include beam blockage due to cellular or water towers, ring-like features due to low freezing levels in the atmosphere, and others.
Dr. Wright’s team at UW-Madison has created a bespoke version of Stage IV. It has been regridded from its original HRAP 4 km resolution to a regular 0.03 degree latitude/longitude grid using the mass-conserving regridding scheme from the xesmf library. Any missing data (including over the ocean) over the regridded domain, which covers -126.5 to -63.5 degrees west, 24.0 to 51.0 degrees north, has been infilled with hourly data from the ERA5 reanalysis. Then a bias adjustment is applied in certain conditions such that daily precipitation values are rescaled according to the nClimGrid-Daily dataset (see below). This bias adjustment is included to compensate for biases in ERA5, as well as underestimation by Stage IV in mountainous areas and during cold weather. This bias adjustment isn’t perfect, but overall is a good thing. You can download RainyDay-ready Stage IV precipitation data for CONUS here.
NOTE: The Madison, WI case study used an earlier of Stage IV, which had a 0.0417 degree resolution, was generated using an inferior regridding scheme, and did nothing to address missing data or to correct biases. In the interest of being able to reproduce the results of that case study, the dataset is available here. That version, however, is not recommended for normal use.
NASA NLDAS-2 ¶
NASA NLDAS-2 precipitation is a multi-sensor precipitation dataset that covers the continental United States and extends beyond its borders into Canada and Mexico. NLDAS-2 has a long time period (starting in 1979), relatively good temporal resolution (hourly), and relatively poor spatial resolution (roughly 8 km grid). It uses a PRISM-style elevation-adjusted precipitation gage interpolation scheme to estimate daily precipitation totals, and then uses other information to disaggregate those daily totals into hourly increments. In most places over the continental United States, that disaggregation is done using weather radar; in times and places where radar is inadequate or nonexistent, other tools such as satellite precipitation estimates and numerical weather models are used for this disaggregation. NLDAS-2 is a good choice for relatively large watersheds or specific applications in which spatial resolution is a secondary concern, but where temporal resolution and/or record length are important.
In addition, as a hybrid of multiple data sources, the amount and types of data going into NLDAS-2 varies over time. It has been shown that these types of changes can influence the “stability” (i.e. the statistical properties) of long-term datasets (Hamlet and Lettenmaier, 2005). This is important for things like trend analyses; its importance in frequency analysis is less clear but potentially non-negligible. NLDAS-2 precipitation specifically has been shown to have a pronounced “change point” around 1997, when radar data was incorporated into it for the first time (Ferguson and Mocko, 2017).
You can download RainyDay-ready NLDAS-2 precipitation data for CONUS here.
NOAA nClimGrid-Daily ¶
NOAA nClimGrid-Daily: nClimGrid is a long-term (starting in 1951) precipitation dataset created using a PRISM-style elevation-adjusted precipitation gage interpolation scheme to estimate daily precipitation totals. It has relatively high spatial resolution (roughly 5 km grid), but its poor temporal resolution limits its usefulness for SST to very large watersheds, or very specific applications–potentially including analysis of rainfall heterogeneity or sensitivity analysis. It covers the continental United States. You can download RainyDay-ready nClimGrid-Daily precipitation data for CONUS here.
NASA IMERG-Final ¶
NASA IMERG-Final is a global-scale multi-satellite precipitation dataset. It has relatively good temporal resolution (30 minutes), poor spatial resolution (roughly 10 km grid), good time coverage (starting in mid-2000), and covers the entire globe. This global coverage leaves it (and potentially other satellite precipitation datasets such as CMORPH, PERSIANN, or GsMAP) one of the few options for medium/high resolution long-term precipitation estimates in many parts of the world. That said, IMERG and similar datasets often suffer from very poor accuracy. If you want to use IMERG in RainyDay, keep these accuracy issues in mind. This version (specifically IMERG-Final V06B) includes a monthly gage-based correction. This version will be replaced by V07 when it becomes available; that version should feature somewhat improved accuracy due to some algorithmic improvements. You can download RainyDay-ready IMERG V06B Final run data for the entire globe here (warning, this dataset is large at 320 GB).
Appendix C: Creating an ‘Irregular’ Transposition Domain Shapefile ¶
This section explains how to create a poygon shapefile to define the RainyDay transposition domain using the desktop application Google Earth and an online conversion tool. I believe one could use Google Maps in place of Google Earth, but I have not attempted this. One could also create a shapefile of the watershed outline or other area of interest for IDF computation using this approach (to be provided to the WATERSHEDSHP field in the .sst file), though it may not be the best way to do that, and lots of resources exist online and in GIS systems to define watershed boundaries. Note that the same result described here could be achieved using QGIS or ArcGIS. However, the following description is much simpler for those who do not have experience or access to such software.
- Step 1: Download and install Google Earth Pro for desktop: https://www.google.com/earth/desktop/
- Step 2: Open Google Earth Pro and navigate to your area of interest:
- Step 3: Click on the “Add Polygon” button, enter an appropriate name in the “Edit Polygon” dialog box that appears, and draw you’re the boundary to define your desired transposition domain, and then click “OK” in the dialog box:
- Step 4: You’ll now see the polygon appear on the left under “Places”. Now right click on the new polygon and select “Save Place As”. Select an appropriate location and file name, and make sure you save it as a “.kml” file.
- Step 5: In a web browser, navigate to https://mygeodata.cloud/converter/kml-to-shp. Upload the “.kml” file you created. Once you’ve done this, you will be taken to a new page, which displays information about your “.kml” file (including a map of the “bounding rectangle”) and the conversion to be performed. The most important details to note are under “Input Parameters”. You should see “Coordinate system: WGS 84 (EPSG:4326)”. If not, something is wrong. Also, under “Output Parameters”, you’ll want to leave the default Coordinate System, which will be the same as the input file. Alternatively, if the input coordinate system is not WGS 84 (EPSG:4326), you can set the output to this coordinate system (I haven’t tested this). Click the “Convert now!” button and save the output on your machine.
- Step 6: You are now ready to use this shapefile in RainyDay. You will find that the website created a number of different files. The one you need to pay attention to is the one with the “.shp” file extension (though don’t delete or move the other ones!). That be provided to the RainyDay code via the .sst file using the following lines:
DOMAINTYPE irregular
DOMAINSHP {full file path to shapefile}/{shapefile name}.shp
Appendix D: RainyDay Input and Output NetCDF4 Files ¶
Input NetCDF4 Files ¶
This following table describes the structure of input precipitation files to RainyDay. In is generally not recommended that you try to create your own input files. Instead, use one of the datasets already formatted for use in RainyDay.
Precipitation Input |
---|
Purpose: RainyDay reads in input precipitation from netCDF4 files. These files need to conform to a specific format, described here. RainyDay expects to see one file per day, regardless of the temporal resolution of the input precipitation and the length of the dataset. This does mean that precipitation datasets with a temporal resolution coarser than one day cannot be used in RainyDay. You shouldn’t ever want to do that anyway. |
Naming Convention: The input filenames must conform to several requirements: 1) they must end in the ‘.nc’ file extension’; 2) they must contain a date string corresponding to the day of precipitation they contain in YYYYMMDD format, e.g. ‘20210505’; 3) all filenames must have the same number of characters and the date string must be in the same position. In other words, do something simple like ‘StageIV.20210505.nc, StageIV.20210506.nc, StageIV.20210507.nc, …’. |
Dimensions: (T Y X) = (number of time steps in 24 hours) ([LATITUDE_MAX – LATITUDE_MIN] ÷ input rainfall north-south spatial resolution) ([LONGITUDE_MAX – LONGITUDE_MIN] ÷ input rainfall east-west spatial resolution) |
rainrate (T Y X): Rainfall rate in units of mm hr-1 |
time (T): The dates and times for each rainfall field† |
latitude (Y): Latitude*, ordered north to south, of the y-coordinate of the rainfall spacetime fields |
longitude (X): Longitude*, ordered east-to-west, of the x-coordinate of the rainfall spacetime fields |
†All date/time stamps correspond to the end of the accumulation period. For example, if a rainfall dataset has hourly resolution, then the date/time stamp 5/5/2005 12:00Z corresponds to the time period from 5/5/2005 11:00Z to 5/5/2005 12:00Z.
*Note that all latitude and longitude vectors relate the coordinates of the upper-left corner of the grid cell, not the lower-left or center, as some other coordinate systems do.
Output NetCDF4 Files ¶
This following table describes the two types of output files from RainyDay: storm catalogs, and “rainfall scenarios.”
Storm Catalog | Rainfall Scenario |
---|---|
Purpose: contains all necessary rainfall information needed to conduct SST using RainyDay; basically it is a list of storms that have been identified through the current or a previous run of RainyDay, along with the full set of corresponding rainfall spacetime fields and dates/times | Purpose: contains all necessary rainfall information to drive a hazard model (hydrologic model, landslide model, etc.) with the output from the SST procedure. Note that one “realization” is written to each file, so if NREALIZATIONS is greater than one, multiple rainfall scenario files will be produced. Each one will contain a sequence of |
Dimensions: (N T Y X) = NSTORMS (DURATION ÷ input data temporal resolution) ([LATITUDE_MAX – LATITUDE_MIN] ÷ input rainfall north-south spatial resolution) ([LONGITUDE_MAX – LONGITUDE_MIN] ÷ input rainfall east-west spatial resolution) | Dimensions: (n t y x) = NREALIZATIONSª (DURATION‡ ÷ input data temporal resolution) (gridmask height ÷ input rainfall north-south spatial resolution) (gridmask width ÷ input rainfall east-west spatial resolution) |
Variables (dimensions): | Variables (dimensions): |
rainrate (N T Y X): The spacetime rainfall fields for each of N storms that compose the storm catalog, each storm having T fields of size YX. Units MUST be in mm hr-1 | rainrate (n t y x): The spacetime rainfall fields for each of n storms that compose the storm catalog, each storm having t fields of size yx. Units MUST be in $mm hr^{-1}$ |
time (N*T): The dates and times for each rainfall field. For each of N storms, there are T entries† | time (n*t): The dates and times for each rainfall field. For each of n storms, there are t entries† |
latitude (Y): Latitude*, ordered north to south, of the y-coordinate of the rainfall spacetime fields | latitude (y): Latitude*, ordered north to south, of the y-coordinate of the rainfall spacetime fields |
longitude (X): Longitude*, ordered east-to-west, of the x-coordinate of the rainfall spacetime fields | longitude (x): Longitude*, ordered east-to-west, of the x-coordinate of the rainfall spacetime fields |
gridmask (Y * X): Mask showing the accumulation shape that is used to identify the storms; could be a watershed outline, a rectangle, or a single pixel | ylocation (n): the randomly selected north-south location of the upper-left corner of the transposition that produces the each tyx rainfall spacetime series |
domainmask (Y * X): Mask showing the transposition domain; this will only be interesting if an irregular domain is used, rather than a rectangular one. | xlocation (n): the randomly selected east-west location of the upper-left corner of the transposition that produces the each tyx rainfall spacetime series |
ylocation (N): the north-south location that corresponds to the upper-left corner of the maximized rainfall of the size/shape of the gridmask for each storm | basinrainfall (n): Average rainfall (in mm) over the size/shape of the gridmask for each transposition |
xlocation (N): the east-west location that corresponds to the upper-left corner of the maximized rainfall of the size/shape of the gridmask for each storm | returnperiod (n): estimated rainfall return period for each transposition |
basinrainfall (N): Average rainfall (in mm) over the size/shape of the gridmask for each storm | stormnumber (n): Which “member” of the storm catalog was used to generate each transposition |
‡Note that T and t need not be the same. If RainyDay is run using an existing storm catalog, the user can define any t, as long as it is less than or equal to T.
†All date/time stamps correspond to the end of the accumulation period. For example, if a rainfall dataset has hourly resolution, then the date/time stamp 5/5/2005 12:00Z corresponds to the time period from 5/5/2005 11:00Z to 5/5/2005 12:00Z.
ªIf RETURNTHRESHOLD is greater than 1, then n will be less than NREALIZATIONS. This is a practical way to reduce the number of simulations you will run, and reduce the sizes of these rainfall scenario files.
*Note that all latitude and longitude vectors relate the coordinates of the upper-left corner of the grid cell, not the lower-left or center, as some other coordinate systems do.
Appendix E: References ¶
Alexander, G. N. (1963). Using the probability of storm transposition for estimating the frequency of rare floods. Journal of Hydrology, 1(1). https://doi.org/10.1016/0022-1694(63)90032-5
Bonnin, G. M., Martin, D., Lin, B., Parzybok, T., Yekta, M., & Riley, D. (2006). Precipitation-Frequency Atlas of the United States, NOAA Atlas 14, Volume 2, Version 3.0. NOAA, National Weather Service.
Chow, V. T. (1964). Frequency Analysis. In Handbook of applied hydrology: A compendium of water-resources technology. McGraw-Hill.
Daly, C., Halbleib, M., Smith, J. I., Gibson, W. P., Doggett, M. K., Taylor, G. H., Curtis, J., & Pasteris, P. P. (2008). Physiographically sensitive mapping of climatological temperature and precipitation across the conterminous United States. International Journal of Climatology, 28(15), 2031–2064. https://doi.org/10.1002/joc.1688
Daly, C., Neilson, R. P., & Phillips, D. L. (1994). A Statistical-Topographic Model for Mapping Climatological Precipitation over Mountainous Terrain. Journal of Applied Meteorology, 33, 140–158. https://doi.org/10.1175/1520-0450(1994)0330140:ASTMFM2.0.CO;2
England, J. F., Julien, P. Y., & Velleux, M. L. (2014). Physically-based extreme flood frequency with stochastic storm transposition and paleoflood data on large watersheds. Journal of Hydrology, 510(0), 228–245. https://doi.org/10.1016/j.jhydrol.2013.12.021
Ferguson, C. R., & Mocko, D. M. (2017). Diagnosing an artificial trend in NLDAS-2 afternoon precipitation. Journal of Hydrometeorology, JHM-D-16-0251.1. https://doi.org/10.1175/JHM-D-16-0251.1
Foufoula-Georgiou, E. (1989). A probabilistic storm transposition approach for estimating exceedance probabilities of extreme precipitation depths. Water Resources Research, 25(5), 799–815. https://doi.org/10.1029/WR025i005p00799
Franchini, M., Helmlinger, K. R., Foufoula-Georgiou, E., & Todini, E. (1996). Stochastic storm transposition coupled with rainfall—Runoff modeling for estimation of exceedance probabilities of design floods. Journal of Hydrology, 175(1–4), 511–532. https://doi.org/10.1016/S0022-1694(96)80022-9
Gupta, V. K. (1972). Transposition of Storms for Estimating Flood Probability Distributions. Colorado State University.
Hamlet, A. F., & Lettenmaier, D. P. (2005). Production of Temporally Consistent Gridded Precipitation and Temperature Fields for the Continental United States*. Journal of Hydrometeorology, 6(3), 330–336. https://doi.org/10.1175/JHM420.1
Hosking, J. R. M., & Wallis, J. R. (1993). Some statistics useful in regional frequency analysis. Water Resources Research, 29(2), 271–281. https://doi.org/10.1029/92WR01980
Hosking, J. R. M., & Wallis, J. R. (1997). Regional Frequency Analysis: An Approach Based on L-Moments. Cambridge University Press; Cambridge Core. https://doi.org/10.1017/CBO9780511529443
Jarrett, R. D. (1989). Hydrology and paleohydrology used to improve the understanding of flood hydrometeorology in Colorado. Design of Hydraulic Structures, 89, 9–16.
Mahoney, K., Ralph, F. M., Wolter, K., Doesken, N., Dettinger, M., Gottas, D., Coleman, T., & White, A. (2015). Climatology of Extreme Daily Precipitation in Colorado and Its Diverse Spatial and Seasonal Variability. Journal of Hydrometeorology, 16(2), 781–792. https://doi.org/10.1175/JHM-D-14-0112.1
Nathan, R., Jordan, P., Scorah, M., Lang, S., Kuczera, G., Schaefer, M., & Weinmann, E. (2016). Estimating the exceedance probability of extreme rainfalls up to the probable maximum precipitation. Journal of Hydrology, 543, 706–720. https://doi.org/10.1016/j.jhydrol.2016.10.044
National Research Council. (1988). Estimating probabilities of extreme floods: Methods and recommended research. National Academy Press.
NOAA Earth Systems Research Laboratory. (2018). Colorado – New Mexico Regional Extreme Precipitation Study. Summary Reprot, Volume IV (Application of Dynamical Model Approaches Using the NOAA High Resolution Rapid Refresh and Weather Research and Forecast Models, p. 45). https://www.ose.state.nm.us/dams/CONMreps/VOLUME%20IV/Volume%20IV%20Summary.pdf
Perez, G., Gomez-Velez, J. D., Mantilla, R., Wright, D. B., & Li, Z. (2021). The Effect of Storm Direction on Flood Frequency Analysis. Geophysical Research Letters, 48(9), e2020GL091918. https://doi.org/10.1029/2020GL091918
Perez, G., Mantilla, R., Krajewski, W. F., & Wright, D. B. (2019). Using Physically Based Synthetic Peak Flows to Assess Local and Regional Flood Frequency Analysis Methods. Water Resources Research, n/a(n/a). https://doi.org/10.1029/2019WR024827
Perica, S., Martin, D., Pavlovic, S., Roy, I., St. Laurent, M., Trypaluk, C., Unruh, D., Yekta, M., & Bonnin, G. (2013). NOAA Atlas 14 Volume 8 Version 2, Precipitation-Frequency Atlas of the United States, Midwestern States. NOAA, National Weather Service.
Rossi, M. W., Anderson, R. S., Anderson, S. P., & Tucker, G. E. (2020). Orographic Controls on Subdaily Rainfall Statistics and Flood Frequency in the Colorado Front Range, USA. Geophysical Research Letters, 47(4). https://doi.org/10.1029/2019GL085086
Smith, J. A., Villarini, G., & Baeck, M. L. (2011). Mixture Distributions and the Hydroclimatology of Extreme Rainfall and Flooding in the Eastern United States. Journal of Hydrometeorology, 12(2), 294–309. https://doi.org/10.1175/2010JHM1242.1
Svensson, C., & Jones, D. A. (2010). Review of rainfall frequency estimation methods: Review of rainfall frequency estimation methods. Journal of Flood Risk Management, 3(4), 296–313. https://doi.org/10.1111/j.1753-318X.2010.01079.x
U.S. Weather Bureau. (1958). Rainfall intensity-frequency regime, Part 2-Southeastern United States, Technical Paper No. 29.
USACE, U. S. A. C. of E. (1973). Storm Rainfall in the United States. Office of the Chief of Engineers.
Walt, S. van der, Colbert, S. C., & Varoquaux, G. (2011). The NumPy Array: A Structure for Efficient Numerical Computation. Computing in Science & Engineering, 13(2).
Wilson, L. L., & Foufoula‐Georgiou, E. (1990). Regional Rainfall Frequency Analysis via Stochastic Storm Transposition. Journal of Hydraulic Engineering, 116(7), 859–880. https://doi.org/10.1061/(ASCE)0733-9429(1990)116:7(859)
Wright, D. B., & Holman, K. D. (2019). Rescaling Transposed Extreme Rainfall within a Heterogeneous Region. Journal of Hydrologic Engineering, 24(6), 06019001. https://doi.org/10.1061/(ASCE)HE.1943-5584.0001781
Wright, D. B., Mantilla, R., & Peters-Lidard, C. D. (2017). A remote sensing-based tool for assessing rainfall-driven hazards. Environmental Modelling & Software, 90, 34–54. https://doi.org/10.1016/j.envsoft.2016.12.006
Wright, D. B., Samaras, C., & Lopez-Cantu, T. (2021). Resilience to Extreme Rainfall Starts with Science. Bulletin of the American Meteorological Society, 1–14. https://doi.org/10.1175/BAMS-D-20-0267.1
Wright, D. B., Smith, J. A., & Baeck, M. L. (2014). Flood frequency analysis using radar rainfall fields and stochastic storm transposition. Water Resources Research, 50(2), 1592–1615. https://doi.org/10.1002/2013WR014224
Wright, D. B., Smith, J. A., Villarini, G., & Baeck, M. L. (2013). Estimating the frequency of extreme rainfall using weather radar and stochastic storm transposition. Journal of Hydrology, 488, 150–165.
Wright, D. B., Yu, G., & England, J. F. (2020). Six decades of rainfall and flood frequency analysis using stochastic storm transposition: Review, progress, and prospects. Journal of Hydrology, 585, 124816. https://doi.org/10.1016/j.jhydrol.2020.124816
Yu, G., Wright, D. B., & Holman, K. D. (2021). Connecting Hydrometeorological Processes to Low-Probability Floods in the Mountainous Colorado Front Range. Water Resources Research, 57(4), e2021WR029768. https://doi.org/10.1029/2021WR029768
Yu, G., Wright, D. B., & Li, Z. (2020). The Upper Tail of Precipitation in Convection-Permitting Regional Climate Models and Their Utility in Nonstationary Rainfall and Flood Frequency Analysis. Earth’s Future, n/a(n/a), e2020EF001613. https://doi.org/10.1029/2020EF001613
Yu, G., Wright, D. B., Zhu, Z., Smith, C., & Holman, K. D. (2019). Process-based flood frequency analysis in an agricultural watershed exhibiting nonstationary flood seasonality. Hydrol. Earth Syst. Sci., 23(5), 2225–2243. https://doi.org/10.5194/hess-23-2225-2019
Zhou, Z., Smith, J. A., Wright, D. B., Baeck, M. L., Yang, L., & Liu, S. (2019). Storm Catalog-Based Analysis of Rainfall Heterogeneity and Frequency in a Complex Terrain. Water Resources Research, 55(3), 1871–1889. https://doi.org/10.1029/2018WR023567