If the weight function in (6) is replaced by a Kernel function $K$ which satisfies

\int_{-\infty}^\infty K(x)dx = 1
\end{displaymath} (7)

The kernel density estimator is defined as (see for example Silverman, 1986, p. 15)

\hat f(x) = \frac{1}{nh} \sum_{i=1}^n K\left(\frac{x-x_i}{h}\right).
\end{displaymath} (8)

where $h$, is the window width or bandwidth of the estimator. Note that the estimate smoothness depends on the bandwidth. If $h$ is small then the estimate will consist of spikes centred on the $x_i$ if $h$ is large then the estimated density tends to the uniform and all detail is obscured.

The estimate obtained is continuous if $K$ is continuous and so may avoid the problems associated with the naive estimator or the histogram. A real disadvantage, in this context, of both the naive estimator and the histogram is that they both exhibit a lack of continuity.

The estimated density is a sum of $n$ functions, where $n$ is the number of items of data. This means that it has the same properties as the kernel function - if $K$ is a probability density function4 then so too is $\hat f$.

It should also be noted that the naive estimator, as defined above, is a KDE with the non-continuous Kernel

K_n(x) = \left\{
\frac{1}{2}& \mbox{ if } -1 < x < 1 \\
0 & \mbox{otherwise}
\end{displaymath} (9)

However, this kernel gives an estimate with discontinuities similar to, if not so visually obvious as, the histogram.

It is usual that $K(x) \geq 0,   \forall x$, however there are arguments for sometimes using kernels which take negative values (see Silverman (1986) section 3.6). This can lead to problems and, as the advantages are not large, the kernel functions used in the present work are everywhere non-negative.

danny 2009-07-23