Stochastic Engine Convergence Diagnostics

SciTech Connect

The stochastic engine uses a Markov Chain Monte Carlo (MCMC) sampling device to allow an analyst to construct a reasonable estimate of the state of nature that is consistent with observed data and modeling assumptions. The key engine output is a sample from the posterior distribution, which is the conditional probability distribution of the state of nature, given the data. In applications the state of nature may refer to a complicated, multi-attributed feature like the lithology map of a volume of earth, or to a particular related parameter of interest, say the centroid of the largest contiguous sub-region of specified lithology type. The posterior distribution, which we will call f, can be thought of as the best stochastic description of the state of nature that incorporates all pertinent physical and theoretical models as well as observed data. Characterization of the posterior distribution is the primary goal in the Bayesian statistical paradigm. In applications of the stochastic engine, however, analytical calculation of the posterior distribution is precluded, and only a sample drawn from the distribution is feasible. The engine's MCMC technique, which employs the Metropolis-Hastings algorithm, provides a sample in the form of a sequence (chain) of possible states of nature, x{sup (1)}, x{sup (2)}, ..., x{sup (T)}, .... The sequencing is motivated by consideration of comparative likelihoods of the data. Asymptotic results ensure that the sample ultimately spans the entire posterior distribution and reveals the actual state frequencies that characterize the posterior. In mathematical jargon, the sample is an ergodic Markov chain with stationary distribution f. What this means is that once the chain has gone a sufficient number of steps, T{sub 0}, the (unconditional) distribution of the state, x{sup (T)}, at any step T {ge} T{sub 0} is the same (i.e., is ''stationary''), and is the posterior distribution, f. We call T{sub 0} the ''burn-in'' period. The MCMC process begins at a particular state, which is selected at random or by design, according to the wish of the user of the engine. After the burn-in period, the chain has essentially forgotten where it started. Moreover, the sample x{sup (t{sub 0})}, x{sup (T{sub 0}+1)},... can be used for most purposes as a random sample from f, even though the x{sup (T{sub 0}+t)}, because of Markovian dependency, are not independent. For example, averages involving x{sup (t{sub 0})}, x{sup (t{sub 0}+1)},... may have an approximate normal distribution. The purpose of this note is to discuss the monitoring techniques currently in place in the stochastic engine software that addresses the issues of burn-in, stationarity, and normality. They are loosely termed ''convergence diagnostics'', in reference to the underlying Markov chains, which converge asymptotically to the desired posterior distribution.