Loading content ...

The Gaussian–Mixture–Model Dynamically–Orthogonal (GMM–DO)
smoother is exemplified and contrasted with other smoothers by applications
to three dynamical systems, all of which admit far–from–Gaussian statistics.
A double–well–diffusion experiment is first used to examine the capabilities
of the smoother and compare its performance to that of the Ensemble Kalman
Smoother. A passive tracer advected by a reversible shear flow is then
employed. The exact smoothed solution is obtained and utilized to validate
the GMM–DO smoother and its results. Finally, the third example illustrates
the applicability of the smoother in more complex ocean flows consisting of
variable jets and eddies. To illustrate the non-Gaussian effects, comparisons
are then made with the update of the Error Subspace Statistical Estimation
smoother. In each application, the properties of the GMM–DO smoother and
of its posterior probabilities are studied and quantified. Rigorous evaluation
of Bayesian smoothers for nonlinear high-dimensional dynamical systems is
challenging in itself. The present three dynamical system examples provide
complementary and effective benchmarks for such evaluation.

Retrospective inference through Bayesian smoothing is indispensable in
geophysics, with crucial applications in ocean estimation, numerical weather
prediction, climate dynamics and Earth system modeling. However, dealing
with the high–dimensionality and nonlinearity of geophysical processes
remains a major challenge in the development of Bayesian smoothers. Addressing
this issue, we obtain a novel smoothing methodology for high–
dimensional stochastic fields governed by general nonlinear dynamics. Building
on recent Bayesian filters and classic Kalman smoothers, the equations
and forward–backward algorithm of the new smoother are derived. The
smoother uses the stochastic Dynamically–Orthogonal (DO) field equations
and their time–evolving stochastic subspace to predict the prior probabilities.
Bayesian inference, both forward and backward in time, is then analytically
carried out in the dominant DO subspace, after fitting semi–parametric Gaussian
Mixture Models (GMMs) to joint DO realizations. The theoretical properties
and computational cost of the new GMM-DO smoother are presented
and discussed.

Regional ocean models are capable of forecasting conditions for usefully long intervals of time
(days) provided that initial and ongoing conditions can be measured. In resource-limited circumstances, the
placement of sensors in optimal locations is essential. Here, a nonlinear optimization approach to determine
optimal adaptive sampling that uses the Genetic Algorithm (GA) method is presented. The method determines
sampling strategies that minimize a user-defined physics-based cost function. The method is evaluated using
identical twin experiments, comparing hindcasts from an ensemble of simulations that assimilate data selected
using the GA adaptive sampling and other methods. For skill metrics, we employ the reduction of the
ensemble root-mean-square-error (RMSE) between the “true” data-assimilative ocean simulation and the
different ensembles of data-assimilative hindcasts. A 5-glider optimal sampling study is set up for a 400 km x
400 km domain in the Middle Atlantic Bight region, along the New Jersey shelf-break. Results are compared
for several ocean and atmospheric forcing conditions.

The properties and capabilities of the GMM-DO filter are assessed and exemplified by applications
to two dynamical systems: (1) the Double Well Diffusion and (2) Sudden Expansion flows; both
of which admit far-from-Gaussian statistics. The former test case, or twin experiment, validates
the use of the EM algorithm and Bayesian Information Criterion with Gaussian Mixture Models
in a filtering context; the latter further exemplifies its ability to efficiently handle state vectors of
non-trivial dimensionality and dynamics with jets and eddies. For each test case, qualitative and
quantitative comparisons are made with contemporary filters. The sensitivity to input parameters
is illustrated and discussed. Properties of the filter are examined and its estimates are described,
including: the equation-based and adaptive prediction of the probability densities; the evolution
of the mean field, stochastic subspace modes and stochastic coefficients; the fitting of Gaussian
Mixture Models; and, the efficient and analytical Bayesian updates at assimilation times and the
corresponding data impacts. The advantages of respecting nonlinear dynamics and preserving
non-Gaussian statistics are brought to light. For realistic test cases admitting complex distributions
and with sparse or noisy measurements, the GMM-DO filter is shown to fundamentally improve the
filtering skill, outperforming simpler schemes invoking the Gaussian parametric distribution.

This work introduces and derives an efficient, data-driven assimilation scheme, focused on a
time-dependent stochastic subspace, that respects nonlinear dynamics and captures non-Gaussian
statistics as it occurs. The motivation is to obtain a filter that is applicable to realistic geophysical
applications but that also rigorously utilizes the governing dynamical equations with information
theory and learning theory for efficient Bayesian data assimilation. Building on the foundations of
classical filters, the underlying theory and algorithmic implementation of the new filter are developed
and derived. The stochastic Dynamically Orthogonal (DO) field equations and their adaptive
stochastic subspace are employed to predict prior probabilities for the full dynamical state, effectively
approximating the Fokker-Planck equation. At assimilation times, the DO realizations are fit to
semiparametric Gaussian mixture models (GMMs) using the Expectation-Maximization algorithm
and the Bayesian Information Criterion. Bayes’ Law is then efficiently carried out analytically within
the evolving stochastic subspace. The resulting GMM-DO filter is illustrated in a very simple example.
Variations of the GMM-DO filter are also provided along with comparisons with related schemes.

THIS REPORT summarizes goals,
activities, and recommendations of a
workshop on data assimilation held in
Williamsburg, Virginia on September
9-11, 2003, and sponsored by the U.S.
Office of Naval Research (ONR) and National
Science Foundation (NSF). The
overall goal of the workshop was to synthesize
research directions for ocean data
assimilation (DA) and outline efforts
required during the next 10 years and
beyond to evolve DA into an integral and
sustained component of global, regional,
and coastal ocean science and observing
and prediction systems. The workshop
built on the success of recent and existing
DA activities such as those sponsored
by the National Oceanographic Partnership
Program (NOPP) and NSF-Information
Technology Research (NSF-ITR).
DA is a quantitative approach to optimally
combine models and observations.
The combination is usually consistent
with model and data uncertainties, which
need to be represented. Ocean DA can
extract maximum knowledge from the
sparse and expensive measurements of
the highly variable ocean dynamics. The
ultimate goal is to better understand and
predict these dynamics on multiple spatial
and temporal scales, including interactions
with other components of the
climate system. There are many applications
that involve DA or build on its results,
including: coastal, regional, seasonal,
and inter-annual ocean and climate
dynamics; carbon and biogeochemical
cycles; ecosystem dynamics; ocean engineering;
observing-system design; coastal
management; fisheries; pollution control;
naval operations; and defense and security.
These applications have different requirements
that lead to variations in the
DA schemes utilized. For literature on
DA, we refer to Ghil and Malanotte-Rizzoli
(1991), the National Research Council
(1991), Bennett (1992), Malanotte-
Rizzoli (1996), Wunsch (1996), Robinson
et al. (1998), Robinson and Lermusiaux
(2002), and Kalnay (2003). We also refer
to the U.S. Global Ocean Data Assimilation
Experiment (GODAE) workshop on
Global Ocean Data Assimilation: Prospects
and Strategies (Rienecker et al., 2001);
U.S. National Oceanic and Atmospheric
Administration-Office of Global Programs
(NOAA-OGP) workshop on Coupled
Data Assimilation (Rienecker, 2003);
and, NOAA-NASA-NSF workshop on
Ongoing Analysis of the Climate System
(Arkin et al., 2003).

The International Lie`ge Colloquium on Ocean
Dynamics is organized annually. The topic differs
from year to year in an attempt to address, as much
as possible, recent problems and incentive new subjects
in oceanography.
Assembling a group of active and eminent scientists
from various countries and often different disciplines,
the Colloquia provide a forum for discussion
and foster a mutually beneficial exchange of information
opening on to a survey of recent discoveries,
essential mechanisms, impelling question marks and
valuable recommendations for future research.
The objective of the 2001 Colloquium was to
evaluate the progress of data assimilation methods in
marine science and, in particular, in coupled hydrodynamic,
ecological and bio-geo-chemical models of
the ocean.
The past decades have seen important advances
in the understanding and modelling of key processes
of the ocean circulation and bio-geo-chemical
cycles. The increasing capabilities of data and
models, and their combination, are allowing the
study of multidisciplinary interactions that occur
dynamically, in multiple ways, on multiscales and
with feedbacks.
The capacity of dynamical models to simulate interdisciplinary
ocean processes over specific space-
time windows and thus forecast their evolution over
predictable time scales is also conditioned upon the
availability of relevant observations to: initialise and
continually update the physical and bio-geo-chemical
sectors of the ocean state; provide relevant atmospheric
and boundary forcing; calibrate the parameterizations
of sub-grid scale processes, growth rates and
reaction rates; construct interdisciplinary and multiscale
correlation and feature models; identify and
estimate the main sources of errors in the models;
control or correct for mis-represented or neglected
processes.
The access to multivariate data sets requires the
implementation, exploitation and management of dedicated
ocean observing and prediction systems. However,
the available data are often limited and, for
instance, seldom in a form to be directly compatible
or directly inserted into the numerical models. To relate
the data to the ocean state on all scales and regions that
matter, evolving three-dimensional and multivariate
(measurement) models are becoming important.
Equally significant is the reduction of observational
requirements by design of sampling strategies via
Observation System Simulation Experiments and
adaptive sampling.
Data assimilation is a quantitative approach to
extract adequate information content from the data
and to improve the consistency between data sets and
model estimates. It is also a methodology to dynamically
interpolate between data scattered in space and
time, allowing comprehensive interpretation of multivariate
observations.
In general, the goals of data assimilation are to:
control the growth of predictability errors; correct
dynamical deficiencies; estimate model parameters,
including the forcings, initial and boundary conditions;
characterise key processes by analysis of four-
0924-7963/03/$ – see front matter D 2003 Elsevier Science B.V. All rights reserved.
doi:10.1016/S0924-7963(03)00027-7
www.elsevier.com/locate/jmarsys
The use of data assimilation in coupled hydrodynamic, ecological and
bio-geo-chemical models of the ocean
Journal of Marine Systems 40-41 (2003) 1-3
dimensional fields and their statistics (balances of
terms, etc.); carry out advanced sensitivity studies
and Observation System Simulation Experiments,
and conduct efficient operations, management and
monitoring.
The theoretical framework of data assimilation
for marine sciences is now relatively well established,
routed in control theory, estimation theory or inverse
techniques, from variational to sequential approaches.
Ongoing research efforts of special importance for
interdisciplinary applications include the: stochastic
representation of processes and determination of
model and data errors; treatment of (open) boundary
conditions and strong nonlinearities; space-time,
multivariate extrapolation of limited and noisy data
and determination of measurement models; demonstration
that bio-geo-chemical models are valid
enough and of adequate structures for their deficiencies
to be controlled by data assimilation; and finally,
ability to provide accurate estimates of fields, parameters,
variabilities and errors, with large and complex
dynamical models and data sets.
Operationally, major engineering and computational
challenges for the coming years include the:
development of theoretically sound methods into
useful, practical and reliable techniques at affordable
costs; implementation of scalable, seamless and automated
systems linking observing systems, numerical
models and assimilation schemes; adequate mix of
integrated and distributed (Web-based) networks; construction
of user-friendly architectures and establishment
of standards for the description of data and
software (metadata) for efficient communication, dissemination
and management.
In addition to addressing the above items, the 33rd
Lie`ge Colloquium has offered the opportunity to:
– review the status and current progress of data
assimilation methodologies utilised in the physical,
acoustical, optical and bio-geo-chemical
scientific communities;
– demonstrate the potentials of data assimilation
systems developed for coupled physical/ecosystem
models, from scientific to management inquiries;
– examine the impact of data assimilation and
inverse modelling in improving model parameterisations;
– discuss the observability and controllability properties
of, and identify the missing gaps in current
observing and prediction systems; and
exchange the results of and the learnings from preoperational
marine exercises.
The presentations given during the Colloquium
lead to discussions on a series of topics organized
within the following sections: (1) Interdisciplinary
research progress and issues: data, models, data
assimilation criteria. (2) Observations for interdisciplinary
data assimilation. (3) Advanced fields estimation
for interdisciplinary systems. (4) Estimation of
interdisciplinary parameters and model structures. (5)
Assimilation methodologies for physical and interdisciplinary
systems. (6) Toward operational interdisciplinary
oceanography and data assimilation. A subset
of these presentations is reported in the present
Special Issue.
As was pointed out during the Colloquium, coupled
biological-physical data assimilation is in its infancy
and much can be accomplished now by the immediate
application of existing methods. Data assimilation
intimately links dynamical models and observations,
and it can play a critical role in the important area of
fundamental biological oceanographic dynamical
model development and validation over a hierarchy
of complexities. Since coupled assimilation for coupled
processes is challenging and can be complicated, care
must be exercised in understanding, modeling and
controlling errors and in performing sensitivity analyses
to establish the robustness of results. Compatible
interdisciplinary data sets are essential and data assimilation
should iteratively define data impact and data
requirements.
Based on the results presented during the Colloquium,
data assimilation is expected to enable future
marine technologies and naval operations otherwise
impossible or not feasible. Interdisciplinary predictability
research, multiscale in both space and time, is
required. State and parameter estimation via data
assimilation is central to the successful establishment
of advanced interdisciplinary ocean observing and
prediction systems which, functioning in real time,
will contribute to novel and efficient capabilities to
manage, and to operate in our oceans.
The Scientific Committee and the participants to
the 33rd Lie`ge Colloquium wish to express their
2 Preface
gratitude to the Ministe`re de l’Enseignement Supe’rieur
et de la Recherche Scientifique de la Communaute
– Francaise de Belgique, the Fonds National de
la Recherche Scientifique de Belgique (F.N.R.S.,
Belgium), the Ministe`re de l’Emploi et de la Formation
du Gouvernement Wallon, the University of
Lie`ge, the Commission of European Union, the
Scientific Committee on Oceanographic Research
(SCOR), the International Oceanographic Commission
of the UNESCO, the US Office of Naval
Research, the National Science Foundation (NSF,
USA) and the International Association for the
Physical Sciences of the Ocean (IAPSO) for their
most valuable support.

Data assimilation is a modern methodology of relating natural data and dynamical
models. The general dynamics of a model is combined or melded with a set of observations.
All dynamical models are to some extent approximate, and all data sets are
finite and to some extent limited by error bounds. The purpose of data assimilation
is to provide estimates of nature which are better estimates than can be obtained by
using only the observational data or the dynamical model. There are a number of
specific approaches to data assimilation which are suitable for estimation of the state
of nature, including natural parameters, and for evaluation of the dynamical approximations.
Progress is accelerating in understanding the dynamics of real ocean biological-
physical interactive processes. Although most biophysical processes in the sea await
discovery, new techniques and novel interdisciplinary studies are evolving ocean science
to a new level of realism. Generally, understanding proceeds from a quantitative
description of four-dimensional structures and events, through the identification of
specific dynamics, to the formulation of simple generalizations. The emergence of
realistic interdisciplinary four-dimensional data assimilative ocean models and systems
is contributing significantly and increasingly to this progress.

The effects of a priori parameters on the error subspace estimation and mapping methodology introduced by
P. F. J. Lermusiaux et al. is investigated. The approach is three-dimensional, multivariate, and multiscale. The
sensitivities of the subspace and a posteriori fields to the size of the subspace, scales considered, and nonlinearities
in the dynamical adjustments are studied. Applications focus on the mesoscale to subbasin-scale physics in the
northwestern Levantine Sea during 10 February-15 March and 19 March-16 April 1995. Forecasts generated
from various analyzed fields are compared to in situ and satellite data. The sensitivities to size show that the
truncation to a subspace is efficient. The use of criteria to determine adequate sizes is emphasized and a backof-
the-envelope rule is outlined. The sensitivities to scales confirm that, for a given region, smaller scales usually
require larger subspaces because of spectral redness. However, synoptic conditions are also shown to strongly
influence the ordering of scales. The sensitivities to the dynamical adjustment reveal that nonlinearities can
modify the variability decomposition, especially the dominant eigenvectors, and that changes are largest for the
features and regions with high shears. Based on the estimated variability variance fields, eigenvalue spectra,
multivariate eigenvectors and (cross)-covariance functions, dominant dynamical balances and the spatial distribution
of hydrographic and velocity characteristic scales are obtained for primary regional features. In particular,
the Ierapetra Eddy is found to be close to gradient-wind balance and coastal-trapped waves are anticipated to
occur along the northern escarpment of the basin.

Data assimilation is a novel, versatile methodology
for estimating oceanic variables. The estimation of
a quantity of interest via data assimilation involves
the combination of observational data with the underlying
dynamical principles governing the system
under observation. The melding of data and dynamics
is a powerful methodology which makes possible
efRcient, accurate, and realistic estimations otherwise
not feasible. It is providing rapid advances in
important aspects of both basic ocean science and
applied marine technology and operations.
The following sections introduce concepts, describe
purposes, present applications to regional dynamics
and forecasting, overview formalism and
methods, and provide a selected range of examples.

A basis is outlined for the first-guess spatial mapping of three-dimensional multivariate and multiscale
geophysical fields and their dominant errors. The a priori error statistics are characterized by covariance matrices
and the mapping obtained by solving a minimum-error-variance estimation problem. The size of the problem is
reduced efficiently by focusing on the error subspace, here the dominant eigendecomposition of the a priori error
covariance. The first estimate of this a priori error subspace is constructed in two parts. For the “observed” portions
of the subspace, the covariance of the a priori missing variability is directly specified and eigendecomposed.
For the “non-observed” portions, an ensemble of adjustment dynamical integrations is utilized, building the nonobserved
covariances in statistical accord with the observed ones. This error subspace construction is exemplified
and studied in a Middle Atlantic Bight simulation and in the eastern Mediterranean. Its use allows an accurate,
global, multiscale and multivariate, three-dimensional analysis of primitive-equation fields and their errors, in real
time. The a posteriori error covariance is computed and indicates complex data-variability influences. The error
and variability subspaces obtained can also confirm or reveal the features of dominant variability, such as the
Ierapetra Eddy in the Levantine basin.

Identical twin experiments are utilized to assess and exemplify the capabilities of error subspace statistical
estimation (ESSE). The experiments consists of nonlinear, primitive equation-based, idealized Middle Atlantic
Bight shelfbreak front simulations. Qualitative and quantitative comparisons with an optimal interpolation (OI)
scheme are made. Essential components of ESSE are illustrated. The evolution of the error subspace, in agreement
with the initial conditions, dynamics, and data properties, is analyzed. The three-dimensional multivariate minimum
variance melding in the error subspace is compared to the OI melding. Several advantages and properties
of ESSE are discussed and evaluated. The continuous singular value decomposition of the nonlinearly evolving
variations of variability and the possibilities of ESSE for dominant process analysis are illustrated and emphasized.

A rational approach is used to identify efficient schemes for data assimilation in nonlinear ocean-atmosphere
models. The conditional mean, a minimum of several cost functionals, is chosen for an optimal estimate. After
stating the present goals and describing some of the existing schemes, the constraints and issues particular to
ocean-atmosphere data assimilation are emphasized. An approximation to the optimal criterion satisfying the
goals and addressing the issues is obtained using heuristic characteristics of geophysical measurements and
models. This leads to the notion of an evolving error subspace, of variable size, that spans and tracks the scales
and processes where the dominant errors occur. The concept of error subspace statistical estimation (ESSE) is
defined. In the present minimum error variance approach, the suboptimal criterion is based on a continued and
energetically optimal reduction of the dimension of error covariance matrices. The evolving error subspace is
characterized by error singular vectors and values, or in other words, the error principal components and
coefficients.
Schemes for filtering and smoothing via ESSE are derived. The data-forecast melding minimizes variance in
the error subspace. Nonlinear Monte Carlo forecasts integrate the error subspace in time. The smoothing is
based on a statistical approximation approach. Comparisons with existing filtering and smoothing procedures
are made. The theoretical and practical advantages of ESSE are discussed. The concepts introduced by the
subspace approach are as useful as the practical benefits. The formalism forms a theoretical basis for the
intercomparison of reduced dimension assimilation methods and for the validation of specific assumptions for
tailored applications. The subspace approach is useful for a wide range of purposes, including nonlinear field
and error forecasting, predictability and stability studies, objective analyses, data-driven simulations, model
improvements, adaptive sampling, and parameter estimation.