Forecast skill

Last updated

In the fields of forecasting and prediction, forecasting skill or prediction skill is any measure of the accuracy and/or degree of association of prediction to an observation or estimate of the actual value of what is being predicted (formally, the predictand); it may be quantified as a skill score. [1]

Contents

In meteorology, more specifically in weather forecasting, skill measures the superiority of a forecast over a simple historical baseline of past observations. The same forecast methodology can result in different skill scores at different places, or even in the same place for different seasons (e.g., spring weather might be driven by erratic local conditions, whereas winter cold snaps might correlate with observable polar winds). Weather forecast skill is often presented in the form of seasonal geographical maps.

Forecasting skill for single-value forecasts (i.e., time series of a scalar quantity) is commonly represented in terms of metrics such as correlation, root mean squared error, mean absolute error, relative mean absolute error, bias, and the Brier score, among others. A number of scores associated with the concept of entropy in information theory are also being used. [2] [3]

The term 'forecast skill' may also be used qualitatively, in which case it could either refer to forecast performance according to a single metric or to the overall forecast performance based on multiple metrics.

Metrics

Probabilistic forecast skill scores may use metrics such as the Ranked Probabilistic Skill Score (RPSS) or the Continuous RPSS (CRPSS), among others. Categorical skill metrics such as the False Alarm Ratio (FAR), the Probability of Detection (POD), the Critical Success Index (CSI), and Equitable Threat Score (ETS) are also relevant for some forecasting applications. Skill is often, but not exclusively, expressed as the relative representation that compares the performance of a particular forecast prediction to that of a reference, benchmark prediction—a formulation called a 'Skill Score'.

Forecasting skill metric and score calculations should be made over a large enough sample of forecast-observation pairs to be statistically robust. A sample of predictions for a single predictand (e.g., temperature at one location, or a single stock value) typically includes forecasts made on a number of different dates. A sample could also pool forecast-observation pairs across space, for a prediction made on a single date, as in the forecast of a weather event that is verified at many locations.

Example skill calculation

An example of a skill calculation which uses the error metric 'Mean Squared Error (MSE)' and the associated skill score is given in the table below. In this case, a perfect forecast results in a forecast skill metric of zero, and skill score value of 1.0. A forecast with equal skill to the reference forecast would have a skill score of 0.0, and a forecast which is less skillful than the reference forecast would have unbounded negative skill score values. [4] [5]

Skill Metric: Mean squared error (MSE)
The associated Skill Score (SS)Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "http://localhost:6011/en.wikipedia.org/v1/":): \ {\mathit {SS}}=1-{\frac {{\mathit {MSE}}_{{\text{forecast}}}}{{\mathit {MSE}}_{{\text{ref}}}}}

Further reading

A broad range of forecast metrics can be found in published and online resources. A good starting point is the Australian Bureau of Meteorology's longstanding web pages on verification at the WWRP/WGNE Joint Working Group on Forecast Verification Research. [6]

A popular textbook and reference that discusses forecast skill is Statistical Methods in the Atmospheric Sciences. [7]

See also

Related Research Articles

There are two main uses of the term calibration in statistics that denote special types of statistical inference problems. "Calibration" can mean

In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one; in unsupervised learning it is usually called a matching matrix.

<span class="mw-page-title-main">Numerical weather prediction</span> Weather prediction using mathematical models of the atmosphere and oceans

Numerical weather prediction (NWP) uses mathematical models of the atmosphere and oceans to predict the weather based on current weather conditions. Though first attempted in the 1920s, it was not until the advent of computer simulation in the 1950s that numerical weather predictions produced realistic results. A number of global and regional forecast models are run in different countries worldwide, using current weather observations relayed from radiosondes, weather satellites and other observing systems as inputs.

<span class="mw-page-title-main">Ensemble forecasting</span> Multiple simulation method for weather forecasting

Ensemble forecasting is a method used in or within numerical weather prediction. Instead of making a single forecast of the most likely weather, a set of forecasts is produced. This set of forecasts aims to give an indication of the range of possible future states of the atmosphere. Ensemble forecasting is a form of Monte Carlo analysis. The multiple simulations are conducted to account for the two usual sources of uncertainty in forecast models: (1) the errors introduced by the use of imperfect initial conditions, amplified by the chaotic nature of the evolution equations of the atmosphere, which is often referred to as sensitive dependence on initial conditions; and (2) errors introduced because of imperfections in the model formulation, such as the approximate mathematical methods to solve the equations. Ideally, the verified future atmospheric state should fall within the predicted ensemble spread, and the amount of spread should be related to the uncertainty (error) of the forecast. In general, this approach can be used to make probabilistic forecasts of any dynamical system, and not just for weather prediction.

Data assimilation is a mathematical discipline that seeks to optimally combine theory with observations. There may be a number of different goals sought – for example, to determine the optimal state estimate of a system, to determine initial conditions for a numerical forecast model, to interpolate sparse observation data using knowledge of the system being observed, to set numerical parameters based on training a model from observed data. Depending on the goal, different solution methods may be used. Data assimilation is distinguished from other forms of machine learning, image analysis, and statistical methods in that it utilizes a dynamical model of the system being analyzed.

The Brier Score is a strictly proper score function or strictly proper scoring rule that measures the accuracy of probabilistic predictions. For unidimensional predictions, it is strictly equivalent to the mean squared error as applied to predicted probabilities.

<span class="mw-page-title-main">Scoring rule</span> Measure for evaluating probabilistic forecasts

In decision theory, a scoring rule provides a summary measure for the evaluation of probabilistic predictions or forecasts. It is applicable to tasks in which predictions assign probabilities to events, i.e. one issues a probability distribution as prediction. This includes probabilistic classification of a set of mutually exclusive outcomes or classes.

<span class="mw-page-title-main">Tropical cyclone forecast model</span> Computer program that uses meteorological data to forecast tropical cyclones

A tropical cyclone forecast model is a computer program that uses meteorological data to forecast aspects of the future state of tropical cyclones. There are three types of models: statistical, dynamical, or combined statistical-dynamic. Dynamical models utilize powerful supercomputers with sophisticated mathematical modeling software and meteorological data to calculate future weather conditions. Statistical models forecast the evolution of a tropical cyclone in a simpler manner, by extrapolating from historical datasets, and thus can be run quickly on platforms such as personal computers. Statistical-dynamical models use aspects of both types of forecasting. Four primary types of forecasts exist for tropical cyclones: track, intensity, storm surge, and rainfall. Dynamical models were not developed until the 1970s and the 1980s, with earlier efforts focused on the storm surge problem.

TAMDAR is a weather monitoring system that consists of an in situ atmospheric sensor mounted on commercial aircraft for data gathering. It collects information similar to that collected by radiosondes carried aloft by weather balloons. It was developed by AirDat LLC, which was acquired by Panasonic Avionics Corporation in April 2013 and was operated until October 2018 under the name Panasonic Weather Solutions. It is now owned by FLYHT Aerospace Solutions Ltd.

In weather forecasting, model output statistics (MOS) is a multiple linear regression technique in which predictands, often near-surface quantities, are related statistically to one or more predictors. The predictors are typically forecasts from a numerical weather prediction (NWP) model, climatic data, and, if applicable, recent surface observations. Thus, output from NWP models can be transformed by the MOS technique into sensible weather parameters that are familiar to a layperson.

<span class="mw-page-title-main">Tropical cyclone forecasting</span> Science of forecasting how a tropical cyclone moves and its effects

Tropical cyclone forecasting is the science of forecasting where a tropical cyclone's center, and its effects, are expected to be at some point in the future. There are several elements to tropical cyclone forecasting: track forecasting, intensity forecasting, rainfall forecasting, storm surge, tornado, and seasonal forecasting. While skill is increasing in regard to track forecasting, intensity forecasting skill remains unchanged over the past several years. Seasonal forecasting began in the 1980s in the Atlantic basin and has spread into other basins in the years since.

<span class="mw-page-title-main">Quantitative precipitation forecast</span> Expected amount of melted precipitation

The quantitative precipitation forecast is the expected amount of melted precipitation accumulated over a specified time period over a specified area. A QPF will be created when precipitation amounts reaching a minimum threshold are expected during the forecast's valid period. Valid periods of precipitation forecasts are normally synoptic hours such as 00:00, 06:00, 12:00 and 18:00 GMT. Terrain is considered in QPFs by use of topography or based upon climatological precipitation patterns from observations with fine detail. Starting in the mid-to-late 1990s, QPFs were used within hydrologic forecast models to simulate impact to rivers throughout the United States. Forecast models show significant sensitivity to humidity levels within the planetary boundary layer, or in the lowest levels of the atmosphere, which decreases with height. QPF can be generated on a quantitative, forecasting amounts, or a qualitative, forecasting the probability of a specific amount, basis. Radar imagery forecasting techniques show higher skill than model forecasts within 6 to 7 hours of the time of the radar image. The forecasts can be verified through use of rain gauge measurements, weather radar estimates, or a combination of both. Various skill scores can be determined to measure the value of the rainfall forecast.

<span class="mw-page-title-main">Precision and recall</span> Pattern recognition performance metrics

In pattern recognition, information retrieval, object detection and classification, precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space.

<span class="mw-page-title-main">History of numerical weather prediction</span> Aspect of meteorological history

The history of numerical weather prediction considers how current weather conditions as input into mathematical models of the atmosphere and oceans to predict the weather and future sea state has changed over the years. Though first attempted manually in the 1920s, it was not until the advent of the computer and computer simulation that computation time was reduced to less than the forecast period itself. ENIAC was used to create the first forecasts via computer in 1950, and over the years more powerful computers have been used to increase the size of initial datasets as well as include more complicated versions of the equations of motion. The development of global forecasting models led to the first climate models. The development of limited area (regional) models facilitated advances in forecasting the tracks of tropical cyclone as well as air quality in the 1970s and 1980s.

Forecast verification is a subfield of the climate, atmospheric and ocean sciences dealing with validating, verifying and determining the predictive power of prognostic model forecasts. Because of the complexity of these models, forecast verification goes a good deal beyond simple measures of statistical association or mean error calculations.

The North American Ensemble Forecast System (NAEFS) is a joint project involving the Meteorological Service of Canada (MSC) in Canada, the National Weather Service (NWS) in the United States, and the National Meteorological Service of Mexico (NMSM) in Mexico providing numerical weather prediction ensemble guidance for the 1- to 16-day forecast period. The NAEFS combines the Canadian MSC and the US NWS global ensemble prediction systems, improving probabilistic operational guidance over what can be built from any individual country's ensemble. Model guidance from the NAEFS is incorporated into the forecasts of the respective national agencies.

The cost-loss model, also called the cost/loss model or the cost-loss decision model, is a model used to understand how the predicted probability of adverse events affects the decision of whether to take a costly precautionary measure to protect oneself against losses from that event. The threshold probability above which it makes sense to take the precautionary measure equals the ratio of the cost of the preventative measure to the loss averted, and this threshold is termed the cost/loss ratio or cost-loss ratio. The model is typically used in the context of using prediction about weather conditions to decide whether to take a precautionary measure or not.

<span class="mw-page-title-main">Evaluation of binary classifiers</span>

The evaluation of binary classifiers compares two methods of assigning a binary attribute, one of which is usually a standard method and the other is being investigated. There are many metrics that can be used to measure the performance of a classifier or predictor; different fields have different preferences for specific metrics due to different goals. For example, in medicine sensitivity and specificity are often used, while in computer science precision and recall are preferred. An important distinction is between metrics that are independent on the prevalence, and metrics that depend on the prevalence – both types are useful, but they have very different properties.

Non-homogeneous Gaussian regression (NGR) is a type of statistical regression analysis used in the atmospheric sciences as a way to convert ensemble forecasts into probabilistic forecasts. Relative to simple linear regression, NGR uses the ensemble spread as an additional predictor, which is used to improve the prediction of uncertainty and allows the predicted uncertainty to vary from case to case. The prediction of uncertainty in NGR is derived from both past forecast errors statistics and the ensemble spread. NGR was originally developed for site-specific medium range temperature forecasting, but has since also been applied to site-specific medium-range wind forecasting and to seasonal forecasts, and has been adapted for precipitation forecasting. The introduction of NGR was the first demonstration that probabilistic forecasts that take account of the varying ensemble spread could achieve better skill scores than forecasts based on standard Model output statistics approaches applied to the ensemble mean.

Glenn Brier was an American statistician, weather forecaster and academic.

References

  1. "American Meteorological Society". Glossary of Meteorology.
  2. Gneiting, Tilmann; Raftery, Adrian E (2007-03-01). "Strictly Proper Scoring Rules, Prediction, and Estimation". Journal of the American Statistical Association. 102 (477): 359–378. doi:10.1198/016214506000001437. ISSN   0162-1459. S2CID   1878582.
  3. Riccardo Benedetti (2010-01-01). "Scoring Rules for Forecast Verification". Monthly Weather Review. 138 (1): 203–211. Bibcode:2010MWRv..138..203B. doi: 10.1175/2009MWR2945.1 .
  4. Roebber, Paul J. (1998), "The Regime Dependence of Degree Day Forecast Technique, Skill, and Value", Weather and Forecasting, 13 (3): 783–794, Bibcode:1998WtFor..13..783R, doi: 10.1175/1520-0434(1998)013<0783:TRDODD>2.0.CO;2
  5. Murphy, Allen H. (1988), "Skill Scores Based on the Mean Square Error and Their Relationships to the Correlation Coefficient", Monthly Weather Review, 116 (12): 2417–2424, Bibcode:1988MWRv..116.2417M, doi: 10.1175/1520-0493(1988)116<2417:SSBOTM>2.0.CO;2
  6. WWRP/WGNE Joint Working Group on Forecast Verification Research.
  7. Wilks, Daniel (2011-06-03). Statistical Methods in the Atmospheric Sciences (3rd ed.). store.elsevier.com. ISBN   9780123850225 . Retrieved 2016-02-01.