Kernel density estimation

Last updated
Kernel density estimation of 100 normally distributed random numbers using different smoothing bandwidths. Kernel density.svg
Kernel density estimation of 100 normally distributed random numbers using different smoothing bandwidths.

In statistics, kernel density estimation (KDE) is the application of kernel smoothing for probability density estimation, i.e., a non-parametric method to estimate the probability density function of a random variable based on kernels as weights. KDE answers a fundamental data smoothing problem where inferences about the population are made based on a finite data sample. In some fields such as signal processing and econometrics it is also termed the Parzen–Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating it in its current form. [1] [2] One of the famous applications of kernel density estimation is in estimating the class-conditional marginal densities of data when using a naive Bayes classifier, [3] [4] which can improve its prediction accuracy. [3]

Contents

Definition

Let (x1, x2, ..., xn) be independent and identically distributed samples drawn from some univariate distribution with an unknown density ƒ at any given point x. We are interested in estimating the shape of this function ƒ. Its kernel density estimator is

where K is the kernel — a non-negative function — and h > 0 is a smoothing parameter called the bandwidth. A kernel with subscript h is called the scaled kernel and defined as Kh(x) = K(). Intuitively one wants to choose h as small as the data will allow; however, there is always a trade-off between the bias of the estimator and its variance. The choice of bandwidth is discussed in more detail below.

A range of kernel functions are commonly used: uniform, triangular, biweight, triweight, Epanechnikov, normal, and others. The Epanechnikov kernel is optimal in a mean square error sense, [5] though the loss of efficiency is small for the kernels listed previously. [6] Due to its convenient mathematical properties, the normal kernel is often used, which means K(x) = ϕ(x), where ϕ is the standard normal density function.

The construction of a kernel density estimate finds interpretations in fields outside of density estimation. [7] For example, in thermodynamics, this is equivalent to the amount of heat generated when heat kernels (the fundamental solution to the heat equation) are placed at each data point locations xi. Similar methods are used to construct discrete Laplace operators on point clouds for manifold learning (e.g. diffusion map).

Example

Kernel density estimates are closely related to histograms, but can be endowed with properties such as smoothness or continuity by using a suitable kernel. The diagram below based on these 6 data points illustrates this relationship:

Sample123456
Value−2.1−1.3−0.41.95.16.2

For the histogram, first, the horizontal axis is divided into sub-intervals or bins which cover the range of the data: In this case, six bins each of width 2. Whenever a data point falls inside this interval, a box of height 1/12 is placed there. If more than one data point falls inside the same bin, the boxes are stacked on top of each other.

For the kernel density estimate, normal kernels with a standard deviation of 1.5 (indicated by the red dashed lines) are placed on each of the data points xi. The kernels are summed to make the kernel density estimate (solid blue curve). The smoothness of the kernel density estimate (compared to the discreteness of the histogram) illustrates how kernel density estimates converge faster to the true underlying density for continuous random variables. [8]

Comparison of the histogram (left) and kernel density estimate (right) constructed using the same data. The six individual kernels are the red dashed curves, the kernel density estimate the blue curves. The data points are the rug plot on the horizontal axis. Comparison of 1D histogram and KDE.png
Comparison of the histogram (left) and kernel density estimate (right) constructed using the same data. The six individual kernels are the red dashed curves, the kernel density estimate the blue curves. The data points are the rug plot on the horizontal axis.

Bandwidth selection

Kernel density estimate (KDE) with different bandwidths of a random sample of 100 points from a standard normal distribution. Grey: true density (standard normal). Red: KDE with h=0.05. Black: KDE with h=0.337. Green: KDE with h=2. Comparison of 1D bandwidth selectors.png
Kernel density estimate (KDE) with different bandwidths of a random sample of 100 points from a standard normal distribution. Grey: true density (standard normal). Red: KDE with h=0.05. Black: KDE with h=0.337. Green: KDE with h=2.

The bandwidth of the kernel is a free parameter which exhibits a strong influence on the resulting estimate. To illustrate its effect, we take a simulated random sample from the standard normal distribution (plotted at the blue spikes in the rug plot on the horizontal axis). The grey curve is the true density (a normal density with mean 0 and variance 1). In comparison, the red curve is undersmoothed since it contains too many spurious data artifacts arising from using a bandwidth h = 0.05, which is too small. The green curve is oversmoothed since using the bandwidth h = 2 obscures much of the underlying structure. The black curve with a bandwidth of h = 0.337 is considered to be optimally smoothed since its density estimate is close to the true density. An extreme situation is encountered in the limit (no smoothing), where the estimate is a sum of n delta functions centered at the coordinates of analyzed samples. In the other extreme limit the estimate retains the shape of the used kernel, centered on the mean of the samples (completely smooth).

The most common optimality criterion used to select this parameter is the expected L2 risk function, also termed the mean integrated squared error:

Under weak assumptions on ƒ and K, (ƒ is the, generally unknown, real density function), [1] [2]

where o is the little o notation, and n the sample size (as above). The AMISE is the asymptotic MISE, i.e. the two leading terms,

where for a function g, and is the second derivative of and is the kernel. The minimum of this AMISE is the solution to this differential equation

or

Neither the AMISE nor the hAMISE formulas can be used directly since they involve the unknown density function or its second derivative . To overcome that difficulty, a variety of automatic, data-based methods have been developed to select the bandwidth. Several review studies have been undertaken to compare their efficacies, [9] [10] [11] [12] [13] [14] [15] with the general consensus that the plug-in selectors [7] [16] [17] and cross validation selectors [18] [19] [20] are the most useful over a wide range of data sets.

Substituting any bandwidth h which has the same asymptotic order n−1/5 as hAMISE into the AMISE gives that AMISE(h) = O(n−4/5), where O is the big O notation. It can be shown that, under weak assumptions, there cannot exist a non-parametric estimator that converges at a faster rate than the kernel estimator. [21] Note that the n−4/5 rate is slower than the typical n−1 convergence rate of parametric methods.

If the bandwidth is not held fixed, but is varied depending upon the location of either the estimate (balloon estimator) or the samples (pointwise estimator), this produces a particularly powerful method termed adaptive or variable bandwidth kernel density estimation.

Bandwidth selection for kernel density estimation of heavy-tailed distributions is relatively difficult. [22]

A rule-of-thumb bandwidth estimator

If Gaussian basis functions are used to approximate univariate data, and the underlying density being estimated is Gaussian, the optimal choice for h (that is, the bandwidth that minimises the mean integrated squared error) is: [23]

An value is considered more robust when it improves the fit for long-tailed and skewed distributions or for bimodal mixture distributions. This is often done empirically by replacing the standard deviation by the parameter below:

where IQR is the interquartile range.
Comparison between rule of thumb and solve-the-equation bandwidth. Kernel density estimation, comparison between rule of thumb and solve-the-equation bandwidth.png
Comparison between rule of thumb and solve-the-equation bandwidth.

Another modification that will improve the model is to reduce the factor from 1.06 to 0.9. Then the final formula would be:

where is the sample size.

This approximation is termed the normal distribution approximation, Gaussian approximation, or Silverman's rule of thumb. [23] While this rule of thumb is easy to compute, it should be used with caution as it can yield widely inaccurate estimates when the density is not close to being normal. For example, when estimating the bimodal Gaussian mixture model

from a sample of 200 points, the figure on the right shows the true density and two kernel density estimates — one using the rule-of-thumb bandwidth, and the other using a solve-the-equation bandwidth. [7] [17] The estimate based on the rule-of-thumb bandwidth is significantly oversmoothed.

Relation to the characteristic function density estimator

Given the sample (x1, x2, ..., xn), it is natural to estimate the characteristic function φ(t) = E[eitX] as

Knowing the characteristic function, it is possible to find the corresponding probability density function through the Fourier transform formula. One difficulty with applying this inversion formula is that it leads to a diverging integral, since the estimate is unreliable for large t’s. To circumvent this problem, the estimator is multiplied by a damping function ψh(t) = ψ(ht), which is equal to 1 at the origin and then falls to 0 at infinity. The “bandwidth parameter” h controls how fast we try to dampen the function . In particular when h is small, then ψh(t) will be approximately one for a large range of t’s, which means that remains practically unaltered in the most important region of t’s.

The most common choice for function ψ is either the uniform function ψ(t) = 1{−1 ≤ t ≤ 1}, which effectively means truncating the interval of integration in the inversion formula to [−1/h, 1/h], or the Gaussian function ψ(t) = eπt2. Once the function ψ has been chosen, the inversion formula may be applied, and the density estimator will be

where K is the Fourier transform of the damping function ψ. Thus the kernel density estimator coincides with the characteristic function density estimator.

Geometric and topological features

We can extend the definition of the (global) mode to a local sense and define the local modes:

Namely, is the collection of points for which the density function is locally maximized. A natural estimator of is a plug-in from KDE, [24] [25] where and are KDE version of and . Under mild assumptions, is a consistent estimator of . Note that one can use the mean shift algorithm [26] [27] [28] to compute the estimator numerically.

Statistical implementation

A non-exhaustive list of software implementations of kernel density estimators includes:

See also

Further reading

Related Research Articles

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

<span class="mw-page-title-main">Weibull distribution</span> Continuous probability distribution

In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It models a broad range of random variables, largely in the nature of a time to failure or time between events. Examples are maximum one-day rainfalls and the time a user spends on a web page.

Importance sampling is a Monte Carlo method for evaluating properties of a particular distribution, while only having samples generated from a different distribution than the distribution of interest. Its introduction in statistics is generally attributed to a paper by Teun Kloek and Herman K. van Dijk in 1978, but its precursors can be found in statistical physics as early as 1949. Importance sampling is also related to umbrella sampling in computational physics. Depending on the application, the term may refer to the process of sampling from this alternative distribution, the process of inference, or both.

In statistics, the method of moments is a method of estimation of population parameters. The same principle is used to derive higher moments like skewness and kurtosis.

Robust statistics are statistics which maintain their properties even if the underlying distributional assumptions are incorrect. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from a parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard deviations; under this model, non-robust methods like a t-test work poorly.

<span class="mw-page-title-main">Kaplan–Meier estimator</span> Non-parametric statistic used to estimate the survival function

The Kaplan–Meier estimator, also known as the product limit estimator, is a non-parametric statistic used to estimate the survival function from lifetime data. In medical research, it is often used to measure the fraction of patients living for a certain amount of time after treatment. In other fields, Kaplan–Meier estimators may be used to measure the length of time people remain unemployed after a job loss, the time-to-failure of machine parts, or how long fleshy fruits remain on plants before they are removed by frugivores. The estimator is named after Edward L. Kaplan and Paul Meier, who each submitted similar manuscripts to the Journal of the American Statistical Association. The journal editor, John Tukey, convinced them to combine their work into one paper, which has been cited more than 34,000 times since its publication in 1958.

In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. However, M-estimators are not inherently robust, as is clear from the fact that they include maximum likelihood estimators, which are in general not robust. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation.

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model. It is used when there is a non-zero amount of correlation between the residuals in the regression model. GLS is employed to improve statistical efficiency and reduce the risk of drawing erroneous inferences, as compared to conventional least squares and weighted least squares methods. It was first described by Alexander Aitken in 1935.

Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: that is, they have heavier tails than the exponential distribution. In many applications it is the right tail of the distribution that is of interest, but a distribution may have a heavy left tail, or both tails may be heavy.

<span class="mw-page-title-main">Generalized Pareto distribution</span> Family of probability distributions often used to model tails or extreme values

In statistics, the generalized Pareto distribution (GPD) is a family of continuous probability distributions. It is often used to model the tails of another distribution. It is specified by three parameters: location , scale , and shape . Sometimes it is specified by only scale and shape and sometimes only by its shape parameter. Some references give the shape parameter as .

In statistics, kernel regression is a non-parametric technique to estimate the conditional expectation of a random variable. The objective is to find a non-linear relation between a pair of random variables X and Y.

Mean shift is a non-parametric feature-space mathematical analysis technique for locating the maxima of a density function, a so-called mode-seeking algorithm. Application domains include cluster analysis in computer vision and image processing.

In statistics, principal component regression (PCR) is a regression analysis technique that is based on principal component analysis (PCA). More specifically, PCR is used for estimating the unknown regression coefficients in a standard linear regression model.

<span class="mw-page-title-main">Maximum spacing estimation</span> Method of estimating a statistical models parameters

In statistics, maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model. The method requires maximization of the geometric mean of spacings in the data, which are the differences between the values of the cumulative distribution function at neighbouring data points.

The generalized normal distribution or generalized Gaussian distribution (GGD) is either of two families of parametric continuous probability distributions on the real line. Both families add a shape parameter to the normal distribution. To distinguish the two families, they are referred to below as "symmetric" and "asymmetric"; however, this is not a standard nomenclature.

In statistics, adaptive or "variable-bandwidth" kernel density estimation is a form of kernel density estimation in which the size of the kernels used in the estimate are varied depending upon either the location of the samples or the location of the test point. It is a particularly effective technique when the sample space is multi-dimensional.

Kernel density estimation is a nonparametric technique for density estimation i.e., estimation of probability density functions, which is one of the fundamental questions in statistics. It can be viewed as a generalisation of histogram density estimation with improved statistical properties. Apart from histograms, other types of density estimators include parametric, spline, wavelet and Fourier series. Kernel density estimators were first introduced in the scientific literature for univariate data in the 1950s and 1960s and subsequently have been widely adopted. It was soon recognised that analogous estimators for multivariate data would be an important addition to multivariate statistics. Based on research carried out in the 1990s and 2000s, multivariate kernel density estimation has reached a level of maturity comparable to its univariate counterparts.

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song , Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.

References

  1. 1 2 Rosenblatt, M. (1956). "Remarks on Some Nonparametric Estimates of a Density Function". The Annals of Mathematical Statistics. 27 (3): 832–837. doi: 10.1214/aoms/1177728190 .
  2. 1 2 Parzen, E. (1962). "On Estimation of a Probability Density Function and Mode". The Annals of Mathematical Statistics . 33 (3): 1065–1076. doi: 10.1214/aoms/1177704472 . JSTOR   2237880.
  3. 1 2 Piryonesi S. Madeh; El-Diraby Tamer E. (2020-06-01). "Role of Data Analytics in Infrastructure Asset Management: Overcoming Data Size and Quality Problems". Journal of Transportation Engineering, Part B: Pavements. 146 (2): 04020022. doi:10.1061/JPEODX.0000175. S2CID   216485629.
  4. Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome H. (2001). The Elements of Statistical Learning : Data Mining, Inference, and Prediction : with 200 full-color illustrations. New York: Springer. ISBN   0-387-95284-5. OCLC   46809224.
  5. Epanechnikov, V.A. (1969). "Non-parametric estimation of a multivariate probability density". Theory of Probability and Its Applications. 14: 153–158. doi:10.1137/1114019.
  6. Wand, M.P; Jones, M.C. (1995). Kernel Smoothing. London: Chapman & Hall/CRC. ISBN   978-0-412-55270-0.
  7. 1 2 3 4 Botev, Zdravko (2007). Nonparametric Density Estimation via Diffusion Mixing (Technical report). University of Queensland.
  8. Scott, D. (1979). "On optimal and data-based histograms". Biometrika. 66 (3): 605–610. doi:10.1093/biomet/66.3.605.
  9. Park, B.U.; Marron, J.S. (1990). "Comparison of data-driven bandwidth selectors". Journal of the American Statistical Association. 85 (409): 66–72. CiteSeerX   10.1.1.154.7321 . doi:10.1080/01621459.1990.10475307. JSTOR   2289526.
  10. Park, B.U.; Turlach, B.A. (1992). "Practical performance of several data driven bandwidth selectors (with discussion)". Computational Statistics. 7: 251–270.
  11. Cao, R.; Cuevas, A.; Manteiga, W. G. (1994). "A comparative study of several smoothing methods in density estimation". Computational Statistics and Data Analysis. 17 (2): 153–176. doi:10.1016/0167-9473(92)00066-Z.
  12. Jones, M.C.; Marron, J.S.; Sheather, S. J. (1996). "A brief survey of bandwidth selection for density estimation". Journal of the American Statistical Association. 91 (433): 401–407. doi:10.2307/2291420. JSTOR   2291420.
  13. Sheather, S.J. (1992). "The performance of six popular bandwidth selection methods on some real data sets (with discussion)". Computational Statistics. 7: 225–250, 271–281.
  14. Agarwal, N.; Aluru, N.R. (2010). "A data-driven stochastic collocation approach for uncertainty quantification in MEMS" (PDF). International Journal for Numerical Methods in Engineering. 83 (5): 575–597. Bibcode:2010IJNME..83..575A. doi:10.1002/nme.2844. S2CID   84834908.
  15. Xu, X.; Yan, Z.; Xu, S. (2015). "Estimating wind speed probability distribution by diffusion-based kernel density method". Electric Power Systems Research. 121: 28–37. Bibcode:2015EPSR..121...28X. doi:10.1016/j.epsr.2014.11.029.
  16. Botev, Z.I.; Grotowski, J.F.; Kroese, D.P. (2010). "Kernel density estimation via diffusion". Annals of Statistics . 38 (5): 2916–2957. arXiv: 1011.2602 . doi:10.1214/10-AOS799. S2CID   41350591.
  17. 1 2 Sheather, S.J.; Jones, M.C. (1991). "A reliable data-based bandwidth selection method for kernel density estimation". Journal of the Royal Statistical Society, Series B. 53 (3): 683–690. doi:10.1111/j.2517-6161.1991.tb01857.x. JSTOR   2345597.
  18. Rudemo, M. (1982). "Empirical choice of histograms and kernel density estimators". Scandinavian Journal of Statistics. 9 (2): 65–78. JSTOR   4615859.
  19. Bowman, A.W. (1984). "An alternative method of cross-validation for the smoothing of density estimates". Biometrika. 71 (2): 353–360. doi:10.1093/biomet/71.2.353.
  20. Hall, P.; Marron, J.S.; Park, B.U. (1992). "Smoothed cross-validation". Probability Theory and Related Fields. 92: 1–20. doi: 10.1007/BF01205233 . S2CID   121181481.
  21. Wahba, G. (1975). "Optimal convergence properties of variable knot, kernel, and orthogonal series methods for density estimation". Annals of Statistics . 3 (1): 15–29. doi: 10.1214/aos/1176342997 .
  22. Buch-Larsen, TINE (2005). "Kernel density estimation for heavy-tailed distributions using the Champernowne transformation". Statistics. 39 (6): 503–518. CiteSeerX   10.1.1.457.1544 . doi:10.1080/02331880500439782. S2CID   219697435.
  23. 1 2 Silverman, B.W. (1986). Density Estimation for Statistics and Data Analysis . London: Chapman & Hall/CRC. p.  45. ISBN   978-0-412-24620-3.
  24. Chen, Yen-Chi; Genovese, Christopher R.; Wasserman, Larry (2016). "A comprehensive approach to mode clustering". Electronic Journal of Statistics. 10 (1): 210–241. arXiv: 1406.1780 . doi: 10.1214/15-ejs1102 . ISSN   1935-7524.
  25. Chazal, Frédéric; Fasy, Brittany Terese; Lecci, Fabrizio; Rinaldo, Alessandro; Wasserman, Larry (2014). "Stochastic Convergence of Persistence Landscapes and Silhouettes". Proceedings of the thirtieth annual symposium on Computational geometry. Vol. 6. New York, New York, USA: ACM Press. pp. 474–483. doi:10.1145/2582112.2582128. ISBN   978-1-4503-2594-3. S2CID   6029340.
  26. Fukunaga, K.; Hostetler, L. (January 1975). "The estimation of the gradient of a density function, with applications in pattern recognition". IEEE Transactions on Information Theory. 21 (1): 32–40. doi:10.1109/tit.1975.1055330. ISSN   0018-9448.
  27. Yizong Cheng (1995). "Mean shift, mode seeking, and clustering". IEEE Transactions on Pattern Analysis and Machine Intelligence. 17 (8): 790–799. CiteSeerX   10.1.1.510.1222 . doi:10.1109/34.400568. ISSN   0162-8828.
  28. Comaniciu, D.; Meer, P. (May 2002). "Mean shift: a robust approach toward feature space analysis". IEEE Transactions on Pattern Analysis and Machine Intelligence. 24 (5): 603–619. doi:10.1109/34.1000236. ISSN   0162-8828. S2CID   691081.
  29. Janert, Philipp K (2009). Gnuplot in action : understanding data with graphs. Connecticut, USA: Manning Publications. ISBN   978-1-933988-39-9. See section 13.2.2 entitled Kernel density estimates.
  30. "Kernel smoothing function estimate for univariate and bivariate data - MATLAB ksdensity". www.mathworks.com. Retrieved 2020-11-05.
  31. Horová, I.; Koláček, J.; Zelinka, J. (2012). Kernel Smoothing in MATLAB: Theory and Practice of Kernel Smoothing. Singapore: World Scientific Publishing. ISBN   978-981-4405-48-5.
  32. "SmoothKernelDistribution—Wolfram Language Documentation". reference.wolfram.com. Retrieved 2020-11-05.
  33. "KernelMixtureDistribution—Wolfram Language Documentation". reference.wolfram.com. Retrieved 2020-11-05.
  34. "Software for calculating kernel densities". www.rsc.org. Retrieved 2020-11-05.
  35. The Numerical Algorithms Group. "NAG Library Routine Document: nagf_smooth_kerndens_gauss (g10baf)" (PDF). NAG Library Manual, Mark 23. Retrieved 2012-02-16.
  36. The Numerical Algorithms Group. "NAG Library Routine Document: nag_kernel_density_estim (g10bac)" (PDF). NAG Library Manual, Mark 9. Archived from the original (PDF) on 2011-11-24. Retrieved 2012-02-16.
  37. Vanderplas, Jake (2013-12-01). "Kernel Density Estimation in Python" . Retrieved 2014-03-12.
  38. "seaborn.kdeplot — seaborn 0.10.1 documentation". seaborn.pydata.org. Retrieved 2020-05-12.
  39. "Kde-gpu: We implemented nadaraya waston kernel density and kernel conditional probability estimator using cuda through cupy. It is much faster than cpu version but it requires GPU with high memory".
  40. "Basic Statistics - RDD-based API - Spark 3.0.1 Documentation". spark.apache.org. Retrieved 2020-11-05.
  41. "kdensity — Univariate kernel density estimation" (PDF). Stata 15 manual.
  42. Jann, Ben (2008-05-26), "KDENS: Stata module for univariate kernel density estimation", Statistical Software Components, Boston College Department of Economics, retrieved 2022-10-15