Basic theory

Consider a model for data ${\mbox{\boldmath$x$}}$ assumed to be observed values of some random variable ${\mbox{\boldmath$X$}}$. This model defines a probability distribution for ${\mbox{\boldmath$X$}}$ in terms of parameters ${\mbox{\boldmath$\theta$}}$, from a parameter space ${\mbox{\boldmath$\Theta$}}$, by a density function $p({\mbox{\boldmath$x$}}\vert {\mbox{\boldmath$\theta$}})$. The value of this density data ${\mbox{\boldmath$x$}}$ is often called the likelihood function as it describes the likelihood of this particular sample ${\mbox{\boldmath$x$}}$ in terms of the parameters ${\mbox{\boldmath$\theta$}}$.

In a Bayesian model initial knowledge about ${\mbox{\boldmath$\theta$}}$ is represented as a prior distribution having density $p({\mbox{\boldmath$\theta$}})$. This may come from some `expert' assessment of the parameter value or from some previous measurement or experiment. Statistical inference for ${\mbox{\boldmath$\theta$}}$ is obtained by using Bayes' Theorem to update knowledge about ${\mbox{\boldmath$\theta$}}$ in the light of the sample data ${\mbox{\boldmath$x$}}$.

The inference about ${\mbox{\boldmath$\theta$}}$, given data ${\mbox{\boldmath$x$}}$, is provided by the posterior density given by Bayes' Theorem as:

\index{Bayes' Theorem ! stated}
\end{displaymath} (1)

where ${\mbox{\boldmath$\Theta$}}$ is the entire parameter space.

It is often convenient and sufficient to express Bayes' theorem to proportionality1 as

p({\mbox{\boldmath$\theta$}} \vert {\mbox{\boldmath$x$}}) \...
...{\mbox{\boldmath$\theta$}}) p({\mbox{\boldmath$\theta$}} ).
\end{displaymath} (2)

Note that

p({\mbox{\boldmath$x$}}) = \int p({\mbox{\boldmath$x$}} \ve...
...) p({\mbox{\boldmath$\theta$}} )d{\mbox{\boldmath$\theta$}}
\end{displaymath} (3)

is obtainable as the constant needed to make $p({\mbox{\boldmath$\theta$}}\vert{\mbox{\boldmath$x$}})$ a proper density, so that

\int p({\mbox{\boldmath$\theta$}}\vert {\mbox{\boldmath$x$}}) = 1.
\end{displaymath} (4)

If the sample is large then the information contained in the prior is swamped by that in the data and the prior has little effect on the posterior density. If, on the other hand, the sample information is small the posterior will be dominated by the prior.

The Bayesian approach has several theoretical advantages over, for example, the more familiar frequentist methods. One such is that it does not violate the likelihood principle which implies that all the information to be learned about ${\mbox{\boldmath$\theta$}}$ from the sample is captured in the likelihood (Lindley, 1965). Hence, two different samples having proportional likelihoods would have the same inference; this is true if Bayesian methods are used, see, for example, Savage (1962) and O'Hagan (1994). As a simple example of a statistic in common use that violates the likelihood principle consider the problem of estimating variance ($\sigma^2$)

s^2=\frac{\sum(x_i-\bar x)}{n-1}
\end{displaymath} (5)

where $\bar x$ sample mean and $n$ sample size, $s^2$ is an unbiased estimator for $\sigma^2$. The denominator has been chosen to remove bias thus considering samples that have not been seen and, hence, information not in the observed data.

danny 2009-07-23