Reduction of uncertainties through Data Model Integration (DMI)
Application of techniques for data model integration (DMI) are increasingly used in many fields of science, finance, economics, etc. Every day examples are improvement of geophysical model descriptions (flows, water levels, waves), improvements and optimization of daily weather forecasts, detection of errors in data series, on-line identification of stolen credit card use, detection of malfunctioning components in manufacturing processes. The one common element is the prior knowledge of the behaviour of a process in the form of an explicit model description, or a set of characteristic data. The second common element is a set of independent or new data. Neither the description of the behaviour and the data are 100% certain – they have uncertainties associated with them. If one has information on the (statistical) nature of these uncertainties, smart mathematical techniques can be used to combine these two information sources and generate new or improved information. As the examples show, this may be an improved model description (less uncertain), an improved forecast, detection of significant deviation from established patterns (faulty component, credit card use,…). In case of the former, we often speak of model calibration and calibration or parameter estimation techniques; in the latter, we speak of (sequential) data assimilation and data assimilation techniques.
Definition of data model integration (DMI)
A practical definition of data model integration (DMI) is the following “Data model integration is an automated structured combination of model and data by means of mathematical techniques to create a theoretically optimal combination of both by reducing the associated uncertainties in the one, the other, or in the information provided by the combination”. Since DMI techniques are being developed and used in many disciplines, other, similar definitions may be encountered.
Measures of agreement – Least squares norms
In geophysical science, DMI is commonly used for model improvement (“calibration”) and optimization of operational forecasts. Application of DMI essentially starts with choosing the parameters of interest that need to be combined and a quantitative measure that expresses the agreement between these parameters. Instead of agreement, we often also use the words “difference”, “disagreement”, “mismatch”, “misfit” or “error”. Least squares criteria or norms are often used measures for this, since they are symmetric and have favourable properties from a theoretical point of view. The key issue of using quantitative measures is that they are compact, quantitative, objective, reproducible, transferable, and are easy to use in automated evaluation procedures and software.
Role of uncertainties in models and data
Models are never “true”. Even the very best models provide schematised representations of the real world. Examples are flow models, models for transport and spreading, models for wave propagation, rainfall runoff models and morphological models. They are limited to the representation of those real world phenomena that are of specific practical interest, characterised by associated temporal and spatial scales of interest. In the derivation of these models all kind of simplifications and approximations have been applied. These are often formulated as “errors” or “uncertainties” in the model. These uncertainties occur in (1) the model concept as such, (2) in the various model parameters, (3) the driving forces, and (4) in the modelling result. Moreover, a model uncertainty of general nature is associated with (5) the representativity of model results for observed entities. Equally, field measurements or observations also suffer from errors or uncertainties. These may be the result of (1) equipment accuracy, (2) instrument drift, (3) equipment fouling or malfunctioning, (4) temporal and spatial sampling frequency, (5) data processing and interpretation, (6) spatial and temporal representativeness, and other. As a result of all this, mismatches of model results and observations are virtually unavoidable. Moreover, both sources of information involve errors in their estimate for the true state of the system. The errors in the model at one hand, and measurements on the other, can be of very different type, origin, and magnitude.
Combination using DMI techniques reduces the uncertainty
Depending on the DMI algorithms that are used, and/or correctness of assumptions, a combination of data and model with known uncertainty (in statistical sense) can lead to (statistically) optimal estimate for the system’s state. Such optimal estimates are achieved when the weights in the combination of model outcomes and measurements are based on the uncertainties in both. This is illustrated by a simple example, in which M is a model result for some system state variable at some spatial position and time, and the spread M is its uncertainty. Similarly, 0 and 0 are the corresponding measured value and the uncertainty in the measurement. A (statistically) optimal combination of these two estimates leads to the estimate
Clearly the uncertainty in the combined estimate is less that the uncertainty in the individual estimates. This example reflects the essence of DMI and in applications of structured DMI techniques to real life numerical models (dealing with many grid points and state variables, complex and non-linear dynamics, high model computation times, etc.) the above principle is ‘merely’ generalised in an appropriate way.
Formulation of the uncertainty
An important first step is prescribing known (or assumed) uncertainties in the models and data. When dealing with dynamic and spatially distributed models the temporal and spatial (statistical) properties must be considered carefully. In fact, the time and length scales of the uncertainties should be consistent with the process(es) being modelled. Therefore, as for the actual (deterministic) numerical model, process and system knowledge should be used as much as possible in the formulation of the so-called “uncertainty model”. The better the uncertainty characteristics of the model and its various parameters, and data series, etc. are known, the more accurate and effective the DMI-technique can be in estimating the desired result and optimising the estimate of a system state and reduction of the uncertainty in that estimate.
By adding terms for the model uncertainties on the deterministic equations for the model the model is converted into a so-called stochastic model. Similarly, terms for the observation uncertainties are added to the equation that can formally be written for the measurements. The data assimilation procedures use these new stochastic equations in order to derive the desired optimal result by suitable combination.
Calibration of models
The main goal of a model calibration is the identification of uncertain model parameters. The model is assumed to be perfect, except for a number of not well known parameters or “control variables”. These control variables may originate from parameterisations of uncertain coefficients in the model, initial or boundary conditions, and/or the external forcing. Measurements are used to obtain estimates for these parameters. These estimates are the values of these parameters for which, in some sense, the model outcomes agree best with the measurements. Therefore calibration is often also called “model fitting” or “parameter estimation”. Uncertainties in the measurements can be taken into account, and be used in the definition of the calibration criterion (see below) and the assessment of the uncertainties in the identified model parameters.
Calibration is usually translated into an optimisation problem where some Goodness of Fit (GOF) or Cost Function (CF) must be maximised or minimised. A GOF or CF provides a formalised and quantitative description of the agreement between measurements and the corresponding model outcomes. In this way the (main) features or targets that the model must reproduce can be specified. In the formulation of the GOF or CF the uncertainties in the data can explicitly be taken into account. For example, data points with the highest accuracy will have the largest “weight” in – or contribution to - the CF, and thus have the largest impact on the final estimate for the model parameters.
Although calibration is often formulated and carried out in a deterministic sense, a close relation to statistics can be recognised. In fact, a statistical interpretation can be assigned to the comparison of the model outcomes on one hand, and the measurements and (the statistical description of) their uncertainties on the other. On this basis a GOF or CF can be derived, rather than “independently” prescribed. For example, when the estimation is based on Maximum Likelihood (MLH), cost functions of type least squares will be found. The MLH formalism will then automatically also provide a recipe for computing the uncertainties in the parameters’ estimates.
When a model calibration is formulated as an optimisation of a GOF or CF, the parameter estimation is virtually a minimisation problem. For special cases, as for example linear models, this minimisation can be done analytically. In the other case it must be relied on numerical techniques. Because of their efficiency (in the sense of the number of model evaluations that is required to find the minimum) gradient descent techniques as for example conjugate gradient or quasi-Newton methods are by far most efficient. A main problem is often the evaluation of the derivatives of the CF, however. For many data driven models (analytical regression models, empirical formulae, Neural Networks, etc.) the derivatives can usually straightforwardly and analytically be computed. For large scale dynamical numerical (flow, wave, transport, morphological, meteorological) models, with often a large number of uncertain parameters, this is certainly not the case and for the computation of gradients the so called adjoint model can be used. The adjoint model is derived from the original model by means of a variational analysis. For descriptions and applications of adjoint modelling see e.g. Chavent (1980), Panchang and O’Brien (1990), Van den Boogaard et al. (1993), Lardner et al. (1993), Mouthaan et al. (1994). A main practical disadvantage of the adjoint is the time and cost of its implementation, however. For computationally less demanding models gradient free (local or global search) minimisation techniques may serve as a reasonable alternative for gradient based methods as long as the number of uncertain parameters is sufficiently low (less than 10, say). An example is the DUD (Doesn’t Use Derivatives) technique (Ralston and Jennrich, 1978).
Sequential data assimilation in dynamic (time-stepping) models
Even a well calibrated model may not perform perfectly in forecast mode. Prediction errors can be due to several sources of uncertainties as for example unresolved inaccuracies in the model and/or its parameters, non-stationarities, uncertainties in the elements forming the model’s external forcing, etc. To improve a model’s skill for operational and/or real time predictions, on-line or sequential DMI or data assimilation techniques are often used.
The usual approach is to construct a statistical description for all model and measurement uncertainties. In this way the uncertainties are modelled in a statistical way rather than strictly physical. The original deterministic model is thus embedded in a stochastic environment. The actual data assimilation then involves a consistent (spatial and temporal) integration of all sources of information, i.e. the model and all observations so far available. Within this combination of model and data the statistics of their uncertainties must carefully be taken into account. After this integration for the period that measurements are available, an optimal initialisation of the model is obtained for a subsequent forecast simulation (in prediction mode). This integration of data and model can be repeated every time when new observations become available – the time window of assimilation and forecasts proceeds stepwise forward in time. In this way the model can adapt to changing system conditions.
The simplest technique for this is “data insertion”. The model is propagated forward in time until a time is reached at which measurements are available. The model values are simply overwritten with measurement values. This overwriting leaves the model unbalanced and typically injects bursts of gravity noise (spurious modes propagating with the speed of gravity) into the model solution. This therefore is generally not a satisfactory method. A further method is Optimal Interpolation. This method modifies model results whenever observations are encountered by adding some fraction of the difference between modelled and measured quantities to the modelled fields, the fraction being determined by the presumably known error covariance structure of the model solution. To the extent that the error covariances are correctly modelled, this method is statistically optimal. It still results in some gravity wave insertion, and extensive efforts have been made to develop so-called “non-linear normal mode initialisation” procedures to remove the effect of this inserted noise from the subsequent analysis.
Kalman filtering (Kalman, 1960; Kalman and Bucy, 1961) is nowadays a commonly applied procedure for this form of sequential data assimilation, see e.g. Jazwinsky (1970), Gelb (1974) or Maybeck (1979) for the theoretical background. This approach resembles optimal interpolation, except that it explicitly treats uncertainties in the numerical model dynamics, as well as in the observations and computes the solution error covariances as the model propagates forward in time, rather than assuming that they are known a priori. Originally the Kalman Filter was designed for linear systems. For non-linear systems the algorithm must appropriately be adapted or approximated, e.g. by a repeated linearisation of the model at its current state leading to the Extended Kalman Filter (see e.g. Maybeck, 1979). For applications in tidal flow models with emphasis on storm surge forecasting, see Heemink and Kloosterhuis (1990) or Heemink et al. (1997). Recently new algorithms have been developed that do not require or use a model dependent implementation in the form of a tangent linear model. Important examples include the Ensemble Kalman Filter (EnKF) introduced by Evensen (1994; 1997; 2003), and so called reduced-rank approaches (Heemink et al., 1997; Verlaan and Heemink, 1997). Heemink et al. (2001) propose to combine such algorithms. Numerically these generic filter algorithms tend to be more robust for non-linearities in the model than the conventional model dependent approaches such as EKF. Therefore EnKF and reduced rank algorithms may in particular be suited for data assimilation in highly non-linear models. Recent examples of applications in surface hydrology and coastal hydrodynamics are (El Serafy et al., 2005; El Serafy and Mynett, 2004; Weerts and El Serafy, 2006). Other simplified or related methods are the so-called particle filters, e.g. the Residual Resampling Filter (Isard and Blake, 1998; El Serafy and Weerts, 2006).
Combination with data driven modelling techniques
While filtering techniques have sofar been practically applied for conceptual dynamic models, they also provide important new opportunities for combination with data driven models. Given the computational efficiency of data driven models, their combination with on-line sequential data assimilation facilities has a promising potential for operational and real time modelling and forecasting. For hydrology, real time flood forecasting, prediction of water loads in drainage systems, forecasting and control of hydraulic structures such as sluices, weirs or barriers, can be mentioned as relevant applications. For a further discussion on data assimilation in dynamic neural networks, see (Van den Boogaard et al., 2000; Van den Boogaard and Mynett, 2004).
1. Chavent, G. 1980. Identification of distributed parameter systems: about the output least square method, its implementation, and identifiability. In Iserman R. (ed), Proc. 5th IFAC Symposium on Identification and System Parameter Estimation. I: 85-97. New York: Pergamon Press.
2. El Serafy, G.Y., Gerritsen H., Mynett, A.E. and Tanaka M.(2005), “Improvement of stratified flow forecasts in the Osaka Bay using the steady state Kalman filter”, Proc. 31st IAHR Congress, Seoul, Eds. Byong-Ho Jun, Sang-Il Lee, Il Won Seo, Gye-Woon Choi, 795-804.
3. El Serafy, G.Y. and Mynett, A.E. (2004). Comparison of EKF and EnKF in SOBEK RIVER:Case study Maxau – IJssel, Proc. 6th Int. Conf. on HydroInformatics, Eds. Liong, Phoon & Babovic. World Scientific Publishing Company,ISBN 981-238-787-0, 513-520.
4. Evensen, G. (1994), Sequential data assimilation with a non-linear quasi-geostrophic model using Monte Carlo methods to forecast error statistics, J. Geophys. Res., 97(17), 905-924.
5. Evensen G. (1997). “Advanced Data Assimilation for strongly non-linear dynamics,” Monthly weather review, 125(6), pp. 1342-1345.
6. Evensen, G. (2003). The Ensemble Kalman Filter: theoretical formulation and practical implementation. Ocean Dynamics, 53, 343 – 367.
7. Gelb, A. 1974. Applied Optimal Estimation. Cambridge, Massachusetts: The MIT Press. Heemink, A.W., Kloosterhuis, H. 1990. Data assimilation for non-linear tidal models. International Journal for Numerical Methods in Fluids 11(12): 1097-1112.
8. Heemink, A.W., Bolding, K., Verlaan, M. 1997. Storm surge forecasting using Kalman Filtering. Journal Meteorological Society Japan, 75(1B): 305-318.
9. Heemink, A.W., Verlaan, M., Segers, A.J. 2001. Variance Reduced Ensemble Kalman Filtering. Monthly Weather Review, 129: 1718-1728.
10. Isard M. and Blake A. (1998), CONDENSATION -- conditional density propagation for visual tracking, Int. J. Computer Vision, 29, 1, 5-28.
11. Jazwinsky A. H. (1970). Stochastic Processes and Filtering Theory. Academic Press.
12. Kalman R. E. (1960). “A new approach to linear filtering and prediction problems,” Basic Engineering, pp. 35-45.
13. Kalman R. E. and Bucy R. S. (1961). “New Results in linear filtering and prediction theory,” Basic Engineering, 83D, pp. 95-108.
14. Lardner, R.W., Al-Rabeh, A.H., Gunay, N. 1993. Optimal estimation of parameters for a two-dimensional model of the Arabian Gulf. Journal of Geophysical Research. 98(C10): 229-242.
15. Maybeck, P.S. 1979. Stochastic Models, Estimation, and Control. Volume 141-1 of Mathematics in Science and Engineering. London: Academic Press, Inc. Ltd.
16. Mouthaan, E.E.A., A.W. Heemink and K.B. Robaczeswka, 1994. Assimilation of ERS-1 altimeter data in a tidal model of the Continental Shelf, Deutsche Hydrographische Zeitschrift, 285-329.
17. Panchang, V.G., O’Brien, J.J. 1990. On the determination of hydraulic model parameters using the adjoint state formulation. In Davies A.M. (ed.), Modelling Marine Systems, Volume I, Chapter 2: 5-18. Boca Raton, Florida, CRC Press, Inc.
18. Ralston, M.L. and R.I. Jennrich, 1978. Dud, a derivative-free algorithm for nonlinear least squares. Technometrics, 20, 7-14.
19. Van den Boogaard, H.F.P., Hoogkamer, M.J.J. Heemink, A.W. 1993. Parameter identification in particle models. Stochastic Hydrology and Hydraulics 9(2): 109-130.
20. Van den Boogaard, H.F.P., Ten Brummelhuis, P.G.J. Mynett, A.E. 2000. On-Line Data Assimilation in Auto-Regressive Neural Networks. Hydroinformatics 2000 Conference, The University of Engineering, IOWA City, USA, July 23-27, 2000.
21. Van den Boogaard, H.F.P. and A.E. Mynett, 2004. Dynamic neural networks with data assimilation. Hydrological Processes, 18, 1959-1966.
22. Verlaan, M., Heemink, A.W., 1997. Tidal flow forecasting using reduced-rank square root filters. Stochastic Hydrology and Hydraulics, 11(5): 349-368.
23. Weerts, A. and G. Y. El Serafy (2006), Particle Filtering and Ensemble Kalman Filtering for runoff nowcasting using conceptual rainfall runoff models, Water Resources Research, Vol. 42, No. 9, W09403, doi:10.1029/2005WR004093.