The Adaptive Kernel Estimator

The kernel estimator does suffer from one significant drawback, the need to estimate, or choose the bandwidth. If density is such that different values of $h$ are best for different areas of it, then the best a single value of bandwidth can do is a compromise. For example, when applied to long-tailed distributions the fixed kernel estimator will tend to under-smooth the tail. In order to overcome this, the bandwidth of the estimator may be allowed to vary, for example it is possible to have a bandwidth inversely proportional to the local density obtained from some previous estimate, Silverman (1986, p. 100).

The steps given in Silverman (1986) for arriving at this are

  1. Find a pilot estimate $\mbox{\shortstack[c]{$\sim$  $f$}}({\mbox{\boldmath$t$}})$ that satisfies $\mbox{\shortstack[c]{$\sim$  $f$}}(x_i) > 0$ for all $i$
  2. Define local bandwidth factor $\lambda_i$ by
    \begin{displaymath}
\lambda_i=\left\{\frac{\mbox{\shortstack[c]{$\sim$  $f$}}(x_i)}{g}
\right\}^{-\alpha}
\end{displaymath} (10)

    where $g$ is the geometric mean of the $\mbox{\shortstack[c]{$\sim$  $f$}}(x_i)$:


    \begin{displaymath}
\log g = \frac{1}{n}\sum log \mbox{\shortstack[c]{$\sim$  $f$}}(x_i)
\end{displaymath} (11)

    and $\alpha$ is the sensitivity parameter, a number satisfying $0\leq \alpha \leq 1$.

  3. Define the adaptive kernel estimate $\hat f$ by
    \begin{displaymath}
\hat f(t)=\frac{1}{n}\sum_{i=1}^n \frac{1}{(h\lambda_i)^d}
K\left\{\frac{t - x_i}{h \lambda_i} \right\}
\end{displaymath} (12)

    where $K$ is the kernel function and $h$ is the bandwidth. As in the ordinary kernel method, $K$ is a symmetric function integrating to unity.

The first step of this requires the use of some density estimator to obtain a pilot estimate, this does not need to be of particular accuracy and can be a simple as a nearest neighbour estimator5, the fixed BKDE estimate is used here for the very simple reason that it is already part of the Bayes 4 program written to support this work. The sensitivity parameter $\alpha$ controls the sensitivity of the method to variations in the pilot density, setting $\alpha = 0$ gives the fixed bandwidth KDE. Abramson (1982) gives arguments for choosing $\alpha = \frac{1}{2}$ for reasons of minimising the bias of the estimator and finds that:

Proportionally varying the bandwidths like $f^{-\frac{1}{2}}$ at the contributing readings lowers the bias to a vanishing fraction of the usual value, and makes for performance seen in well-known estimators that force moment conditions on the kernel (and so sacrifice positivity of the curve estimate).
Abramson (1982)
.

The factor $g^\alpha$ in (10) means that the bandwidth factors, $\lambda_i$ are free of the scale of the data. However


$\displaystyle h\lambda_i$ $\textstyle =$ (13)
    (14)
  $\textstyle =$ (15)
    (16)

so writing


$\textstyle =$ (17)
    (18)

is equivalent to rescaling $h$ by a factor of .

danny 2009-07-23