Workshop on Large-Scale Inverse Problems and Quantification of Uncertainty: Big Data Meets Big Models

Bishop's Lodge Ranch, Resort & Spa
Santa Fe, New Mexico
May 22-24, 2013

One of the greatest challenges in computational science and engineering (CS&E) today is how to combine complex data with large-scale models to create better predictions. This challenge cuts across every application area within CS&E, from the geosciences to materials to chemical systems to biological systems to astrophysics to engineered systems in aerospace, transportation, buildings, and biomedicine, and beyond. At the heart of this challenge is an inverse problem: we seek to infer unknown model inputs (parameters, source terms, initial or boundary conditions, model structure, etc.) from observations of model outputs. The critical need to quantify the uncertainty in the solution of such inverse problems has gained increasing recognition in recent years. This can be done by casting the problem as one in statistical inference. Here, uncertain observations and uncertain models are combined with available prior knowledge to yield a probability density as the solution of the inverse problem, thereby providing a systematic means of quantifying uncertainties in the model parameters. This facilitates uncertainty quantification of model predictions when the resulting input uncertainties are propagated to the outputs.

Unfortunately, solution of such statistical inverse problems for systems governed by large-scale, complex computational models has traditionally been intractable. However, a number of advances over the past decade have brought this goal much closer. First, improvements in scalable forward solvers for many classes of large-scale models have made feasible the repeated evaluation of model outputs for differing inputs. Second, the exponential growth in high performance computing capabilities has multiplied the effects of the advances in solvers. Third, the emergence of MCMC methods that exploit problem structure has radically improved the prospects of sampling probability densities for inverse problems governed by expensive models. And fourth, recent exponential expansions of observational capabilities have produced massive volumes of data from which inference of large computational models can be carried out.

Capitalizing on these advances, this workshop will address the challenges presented in the formulation and solution of statistical inverse problems employing large and complex data sets to make inferences about large and complex computational models. These challenges include:

Complexity of data: Data can be complex in a variety of ways: they can be voluminous, noisy, heterogeneous, indirect, multi-source, and collected over a range of temporal and spatial scales. Harnessing the power of such data through new methods that combine data and simulations offers tremendous opportunities to gain understanding, optimize performance, and optimally control dynamic systems. In all of these settings, complex data provide a new opportunity to create predictive models that embody both physical principles and rich data. For example, data may be used to adapt the model, or to characterize uncertainties due to missing physics. An equally important aspect is the use of models for adaptive sensing and experimental design. Critical challenges include: how to characterize the uncertainty in large and complex data sets, particularly when their provenance is unknown or when the data pass through a number of processing steps (typically involving additional models)? How do we reconcile information from multiple observational modalities? How do we extract pertinent information from the large volumes of data?
Complexity of models: Conventional sampling methods have been able to solve statistical inverse problems for simple models. The real challenge now is to extend these methods, and develop entirely new ones, to solve statistical inverse problems governed by large-scale complex models, often in the form of PDEs. Complexity might stem from a wide range of spatial and temporal scales represented in the model; the coupling of multiple physics models; the hierarchical nature of the model; heterogeneity of models (e.g., continuum and atomistic, discrete and continuous, structured and unstructured); stochastic model components; severe nonlinearities; and so on. Extremely large model state dimensions (to billions of unknowns and beyond) are a ubiquitous feature of such problems, and massive parallelism is essential. Despite these enormous challenges, the critical need to quantify uncertainties in realistic models of natural and engineered systems from observational data requires the development of a new generation of statistical inverse methods able to cope with model complexity. How can we construct reduced models that can capture the pertinent statistical features but are much cheaper than the original high-fidelity models? How can we characterize uncertainty in model structure? Can we employ powerful deterministic inverse problem tools such as adjoints and Hessians to accelerate sampling methods?
High dimensionality of parameter space: Many frontier statistical inverse problems in CS&E are characterized by uncertain fields, which when discretized give rise to very high-dimensional parameter spaces. Examples include uncertain initial or boundary conditions, heterogeneous sources, heterogeneous model coefficients, and domain shapes. As with expensive computational models, high-dimensionality is prohibitive with conventional statistical computing methods. Still, uncertain fields abound in CS&E models of interest, and we must develop methods to address the underlying challenges. These include how to devise MCMC (and other) sampling strategies that scale to very large parameter dimensions; how to specify priors that are consistent with the discretization; how to adapt parameter fields to the information contained in the data; and how to take advantage of hierarchies of discretization. A common feature of such problems is that, despite the increasingly large volumes of data available for inference of models, the data are often informative about a low-dimensional subspace of the parameters: can methods be developed that invoke the expensive computational models only in those subspaces?

This workshop will provide a forum for discussing these and other challenges that must be overcome to realize the promise of statistical inference of large-scale complex models from large-scale complex data. The workshop brings together researchers from the areas of deterministic and statistical inverse problems, uncertainty quantification, computational statistics and machine learning, numerical analysis of PDEs, high performance scientific computing, and a wide range of application areas. The combined perspectives of the workshop participants will allow us to identify promising future research directions in large-scale statistical inverse problems. To promote extensive discourse, the number of talks will be limited, generous time will be devoted to discussion, and a scientifically vigorous but informal workshop atmosphere will be encouraged. The workshop location at the Bishop's Lodge, in the foothills of the Sangre de Cristo mountains near Santa Fe, New Mexico, will provide an ideal venue for the workshop.

Organizing Committee:

Omar Ghattas, The University of Texas at Austin
Matthias Heinkenschloss, Rice University
Luis Tenorio, Colorado School of Mines
Bart van Bloemen Waanders, Sandia National Laboratories
Karen Willcox, Massachusetts Institute of Technology

Sponsorship and financial support is provided by AFOSR, CSRI at Sandia National Laboratories, DOE ASCR, and ExxonMobil

Maintained by: Sue Rodriguez
Modified on: March 23, 2013