Consider a model for data assumed to be observed values of some random variable . This model defines a probability distribution for in terms of parameters , from a parameter space , by a density function . The value of this density data is often called the likelihood function as it describes the likelihood of this particular sample in terms of the parameters .

In a Bayesian model initial knowledge about is represented as a prior distribution having density . This may come from some `expert' assessment of the parameter value or from some previous measurement or experiment. Statistical inference for is obtained by using Bayes' Theorem to update knowledge about in the light of the sample data .

The inference about , given data , is provided by the posterior density given by Bayes' Theorem as:

where is the entire parameter space.

It is often convenient and sufficient to express Bayes' theorem to
proportionality^{1} as

Note that

is obtainable as the constant needed to make a proper density, so that

(4) |

If the sample is large then the information contained in the prior is swamped by that in the data and the prior has little effect on the posterior density. If, on the other hand, the sample information is small the posterior will be dominated by the prior.

The Bayesian approach has several theoretical advantages over, for
example, the more familiar frequentist methods. One such is that it
does not violate the *
likelihood principle* which implies that all the information to be
learned about
from the sample is captured in the
likelihood (Lindley, 1965). Hence, two different samples having proportional
likelihoods would have the same inference; this is true if Bayesian
methods are used, see, for example, Savage (1962) and
O'Hagan (1994). As a simple example of a statistic in common use
that violates the likelihood principle consider the problem of
estimating variance ()

(5) |

where sample mean and sample size, is an unbiased estimator for . The denominator has been chosen to remove bias thus considering samples that have not been seen and, hence, information not in the observed data.