Bayesian Adaptive Kernel Estimates

The method described above is readily applied to observed data, and, as a second example consider the Old Faithful data set again.

The curve shown in figure 4 is the density, estimated using the above methods and a non-informative prior $N(0,1)$, showing the two main modes.

Figure:Old faithful data, single $h$.
\begin{figure}
\centering
\psfig{figure=/home/danny/thesis/pics/oldf_f1.ps,width=5.25in,angle=0}
\end{figure}

Figure:Old faithful data, variable $h$, central portion of estimate.
\begin{figure}
\centering
\psfig{figure=/home/danny/thesis/pics/oldf_v.ps,width=5.25in,angle=270}
\end{figure}

As can be seen from the estimate, the density has two areas of high density and three of relatively low density. Intuitively a large bandwidth is required in an area of low density and a small bandwidth in an area of high density, smoothing out the tails and revealing more detail in the high density areas. This leads to the notion of an adaptive, two pass, estimate where the bandwidth is inversely proportional to the local density.

An estimate is required that works in a similar way. An initial KDE is used to modify the bandwidth used in the final estimate.

First a fixed bandwidth KDE is adopted as pilot estimate, so that, for such a pilot, on a sample ${\mbox{\boldmath$x$}}_{(n)}$


\begin{displaymath}
{\hat f}_n (t) = \frac{1}{n}\sum^n_{i=1}\frac{1}{h_p}K\left(
\frac{t-x_i}{h_p}\right)
\end{displaymath} (21)

where $h_p$ is the pilot bandwidth. Following Abramson (1982) and taking $\alpha=\frac{1}{2}$ in [*] gives


\begin{displaymath}
\lambda_i = \sqrt {\frac{g_n}{{\hat f}(x_i)}}
\end{displaymath} (22)

as the local bandwidth. For univariate data with ($d=1$) equation ([*]) gives


\begin{displaymath}
{\hat f}_n(t) = \frac{1}{n}\sum_{i=1}^n \frac{1}{h\lambda_i}
K\left( \frac{t-x_i}{h\lambda_i}\right)
\end{displaymath} (23)

where


\begin{displaymath}
h\lambda_i = h\sqrt{\frac{g_n}{{\hat f}(x_i)}} =
\frac{h_1}{\sqrt{{\hat f}_n(x_i)}}
\end{displaymath} (24)

The factor $g_n$ is absorbed in $h$ as discussed in section [*]. So it is seen that (23) is of the form


\begin{displaymath}
{\hat f_n}(t\vert{\mbox{\boldmath$x$}}, {\mbox{\boldmath$\t...
...1}{n}\sum_{i=1}^n K(t\vert x_i,
{\mbox{\boldmath$\theta$}}).
\end{displaymath} (25)

This is similar to (2) and is still easily accommodated in the formulation based on (3).

Taking


\begin{displaymath}
p(x_i\vert{\mbox{\boldmath$x$}}_{(i-1)}, {\mbox{\boldmath$\theta$}}) = {\hat f}_{(i-1)}(x_i)
\end{displaymath} (26)

a likelihood for the analysis can be constructed as before.

Figure 5 shows the density obtained from the eruption duration subset of the Old Faithful data using the adaptive KDE. The two main peaks are still there, however there is a third peak appearing at around $4.5$ minutes that is not obvious in the simple estimate. This is also seen in Figure 6 and, to a slightly reduced level, in the larger data set in Figure 74 (this data has $298$ items as opposed to $107$).

Figure:Old Faithful data - histogram with 17 bins.
\begin{figure}
\centering
\psfig{figure=/home/danny/thesis/pics/histoldf.ps,width=5.25in,angle=270}
\end{figure}

Figure:Old Faithful data, larger data set - histogram with 17 bins.
\begin{figure}
\centering
\psfig{figure=/home/danny/thesis/pics/histoldf2.ps,width=5.25in,angle=270}
\end{figure}

danny 2009-07-23