language-icon Old Web
English
Sign In

Data assimilation

Data assimilation is a mathematical discipline that seeks to optimally combine theory (usually in the form of a numerical model) with observations. There may be a number of different goals sought, for example—to determine the optimal state estimate of a system, to determine initial conditions for a numerical forecast model, to interpolate sparse observation data using (e.g. physical) knowledge of the system being observed, to train numerical model parameters based on observed data. Depending on the goal, different solution methods may be used. Data assimilation is distinguished from other forms of machine learning, image analysis, and statistical methods in that it utilizes a dynamical model of the system being analyzed. Data assimilation is a mathematical discipline that seeks to optimally combine theory (usually in the form of a numerical model) with observations. There may be a number of different goals sought, for example—to determine the optimal state estimate of a system, to determine initial conditions for a numerical forecast model, to interpolate sparse observation data using (e.g. physical) knowledge of the system being observed, to train numerical model parameters based on observed data. Depending on the goal, different solution methods may be used. Data assimilation is distinguished from other forms of machine learning, image analysis, and statistical methods in that it utilizes a dynamical model of the system being analyzed. Data assimilation initially developed in the field of numerical weather prediction. Numerical weather prediction models are equations describing the dynamical behavior of the atmosphere, typically coded into a computer program. In order to use these models to make forecasts, initial conditions are needed for the model that closely resemble the current state of the atmosphere. Simply inserting point-wise measurements into the numerical models did not provide a satisfactory solution. Real world measurements contain errors both due to the quality of the instrument and how accurately the position of the measurement is known. These errors can cause instabilities in the models that eliminate any level of skill in a forecast. Thus, more sophisticated methods were needed in order to initialize a model using all available data while making sure to maintain stability in the numerical model. Such data typically includes the measurements as well as a previous forecast valid at the same time the measurements are made. If applied iteratively, this process begins to accumulate information from past observations into all subsequent forecasts. Because data assimilation developed out of the field of numerical weather prediction, it initially gained popularity amongst the geosciences. In fact, one of the most cited publication in all of the geosciences is an application of data assimilation to reconstruct the observed history of the atmosphere. Classically, data assimilation has been applied to chaotic dynamical systems that are too difficult to predict using simple extrapolation methods. The cause of this difficulty is that small changes in initial conditions can lead to large changes in prediction accuracy. This is sometimes known as the butterfly effect – the sensitive dependence on initial conditions in which a small change in one state of a deterministic nonlinear system can result in large differences in a later state. At any update time, data assimilation usually takes a forecast (also known as the first guess, or background information) and applies a correction to the forecast based on a set of observed data and estimated errors that are present in both the observations and the forecast itself. The difference between the forecast and the observations at that time is called the departure or the innovation (as it provides new information to the data assimilation process). A weighting factor is applied to the innovation to determine how much of a correction should be made to the forecast based on the new information from the observations. The best estimate of the state of the system based on the correction to the forecast determined by a weighting factor times the innovation is called the analysis. In one dimension, computing the analysis could be as simple as forming a weighted average of a forecasted and observed value. In multiple dimensions the problem becomes more difficult. Much of the work in data assimilation is focused on adequately estimating the appropriate weighting factor based on intricate knowledge of the errors in the system. The measurements are usually made of a real-world system, rather than of the model's incomplete representation of that system, and so a special function called the observation operator (usually depicted by h() for a nonlinear operator or H for its linearization) is needed to map the modeled variable to a form that can be directly compared with the observation. One of the common mathematical philosophical perspectives is to view data assimilation as a Bayesian estimation problem. From this perspective, the analysis step is an application of Bayes' theorem and the overall assimilation procedure is an example of recursive Bayesian estimation. However, the probabilistic analysis is usually simplified to a computationally feasible form. Advancing the probability distribution in time would be done exactly in the general case by the Fokker-Planck equation, but that is not feasible for high-dimensional systems, so various approximations operating on simplified representations of the probability distributions are used instead. Often the probability distributions are assumed Gaussian so that they can be represented by their mean and covariance, which gives rise to the Kalman filter. Many methods represent the probability distributions only by the mean and input some pre-calculated covariance. An example of a direct (or sequential) method to compute this is called optimal statistical interpolation, or simply optimal interpolation (OI). An alternative approach is to iteratively solve a cost function that solves an identical problem. These are called variational methods, such as 3D-Var and 4D-Var. Typical minimization algorithms are the Conjugate gradient method or the Generalized minimal residual method. The Ensemble Kalman filter is sequential method that uses a Monte Carlo approach to estimate both the mean and the covariance of a Gaussian probability distribution by an ensemble of simulations. More recently, hybrid combinations of ensemble approaches and variational methods have become more popular (e.g. they are used for operational forecasts both at the European Centre for Medium-Range Weather Forecasts (ECMWF) and at the NOAA National Centers for Environmental Prediction (NCEP)). In numerical weather prediction applications, data assimilation is most widely known as a method for combining observations of meteorological variables such as temperature and atmospheric pressure with prior forecasts in order to initialize numerical forecast models.

[ "Assimilation (phonology)", "Climatology", "Meteorology", "variational assimilation", "Canadian Meteorological Centre", "tangent linear model", "common land model", "soil moisture ocean salinity" ]
Parent Topic
Child Topic
    No Parent Topic