The Adaptive Kernel Estimator

The kernel estimator does suffer from one significant drawback, the need to estimate, or choose the bandwidth. If density is such that different values of are best for different areas of it, then the best a single value of bandwidth can do is a compromise. For example, when applied to long-tailed distributions the fixed kernel estimator will tend to under-smooth the tail. In order to overcome this, the bandwidth of the estimator may be allowed to vary, for example it is possible to have a bandwidth inversely proportional to the local density obtained from some previous estimate, Silverman (1986, p. 100).

The steps given in Silverman (1986) for arriving at this are

Find a pilot estimate $\mbox{\shortstack[c]{$\sim$ $f$}}({\mbox{\boldmath$t$}})$ that satisfies $\mbox{\shortstack[c]{$\sim$ $f$}}(x_i) > 0$ for all
Define local bandwidth factor $\lambda_i$ by

$\begin{displaymath} \lambda_i=\left\{\frac{\mbox{\shortstack[c]{$\sim$ $f$}}(x_i)}{g} \right\}^{-\alpha} \end{displaymath}$ (10)

where is the geometric mean of the $\mbox{\shortstack[c]{$\sim$ $f$}}(x_i)$ :

$\begin{displaymath} \log g = \frac{1}{n}\sum log \mbox{\shortstack[c]{$\sim$ $f$}}(x_i) \end{displaymath}$ (11)

and $\alpha$ is the sensitivity parameter, a number satisfying $0\leq \alpha \leq 1$ .
Define the adaptive kernel estimate $\hat f$ by

$\begin{displaymath} \hat f(t)=\frac{1}{n}\sum_{i=1}^n \frac{1}{(h\lambda_i)^d} K\left\{\frac{t - x_i}{h \lambda_i} \right\} \end{displaymath}$ (12)

where is the kernel function and is the bandwidth. As in the ordinary kernel method, is a symmetric function integrating to unity.

The first step of this requires the use of some density estimator to obtain a pilot estimate, this does not need to be of particular accuracy and can be a simple as a nearest neighbour estimator⁵, the fixed BKDE estimate is used here for the very simple reason that it is already part of the Bayes 4 program written to support this work. The sensitivity parameter $\alpha$ controls the sensitivity of the method to variations in the pilot density, setting $\alpha = 0$ gives the fixed bandwidth KDE. Abramson (1982) gives arguments for choosing $\alpha = \frac{1}{2}$ for reasons of minimising the bias of the estimator and finds that:

Proportionally varying the bandwidths like $f^{-\frac{1}{2}}$ at the contributing readings lowers the bias to a vanishing fraction of the usual value, and makes for performance seen in well-known estimators that force moment conditions on the kernel (and so sacrifice positivity of the curve estimate).
Abramson (1982)

The factor $g^\alpha$ in (10) means that the bandwidth factors, $\lambda_i$ are free of the scale of the data. However

$\displaystyle h\lambda_i$	$\textstyle =$	(13)
		(14)
	$\textstyle =$	(15)
		(16)

so writing

	$\textstyle =$		(17)
			(18)

is equivalent to rescaling by a factor of .

danny 2009-07-23