Kernel density based global two-sample comparison test
kde.test.RdKernel density based global two-sample comparison test for 1- to 6-dimensional data.
Usage
kde.test(x1, x2, H1, H2, h1, h2, psi1, psi2, var.fhat1, var.fhat2,
binned=FALSE, bgridsize, verbose=FALSE)Arguments
- x1,x2
vector/matrix of data values
- H1,H2,h1,h2
bandwidth matrices/scalar bandwidths. If these are missing,
Hpi.kfe,hpi.kfeis called by default.- psi1,psi2
zero-th order kernel functional estimates
- var.fhat1,var.fhat2
sample variance of KDE estimates evaluated at x1, x2
- binned
flag for binned estimation. Default is FALSE.
- bgridsize
vector of binning grid sizes
- verbose
flag to print out progress information. Default is FALSE.
Value
A kernel two-sample global significance test is a list with fields:
- Tstat
T statistic
- zstat
z statistic - normalised version of Tstat
- pvalue
\(p\)-value of the double sided test
- mean,var
mean and variance of null distribution
- var.fhat1,var.fhat2
sample variances of KDE values evaluated at data points
- n1,n2
sample sizes
- H1,H2
bandwidth matrices
- psi1,psi12,psi21,psi2
kernel functional estimates
Details
The null hypothesis is \(H_0: f_1 \equiv f_2\) where \(f_1, f_2\) are the respective density functions. The measure of discrepancy is the integrated squared error (ISE) \(T = \int [f_1(\bold{x}) - f_2(\bold{x})]^2 \, d \bold{x}\). If we rewrite this as \(T = \psi_{0,1} - \psi_{0,12} - \psi_{0,21} + \psi_{0,2}\) where \(\psi_{0,uv} = \int f_u (\bold{x}) f_v (\bold{x}) \, d \bold{x}\), then we can use kernel functional estimators. This test statistic has a null distribution which is asymptotically normal, so no bootstrap resampling is required to compute an approximate \(p\)-value.
If H1,H2 are missing then the plug-in selector Hpi.kfe
is automatically called by kde.test to estimate the
functionals with kfe(deriv.order=0). Likewise for missing
h1,h2.
For ks \(\geq\) 1.8.8, kde.test(binned=TRUE) invokes binned
estimation for the computation of the bandwidth selectors, and not the
test statistic and \(p\)-value.
References
Duong, T., Goud, B. & Schauer, K. (2012) Closed-form density-based framework for automatic detection of cellular morphology changes. PNAS 109, 8382–8387.
Examples
set.seed(8192)
samp <- 1000
x <- rnorm.mixt(n=samp, mus=0, sigmas=1, props=1)
y <- rnorm.mixt(n=samp, mus=0, sigmas=1, props=1)
kde.test(x1=x, x2=y)$pvalue ## accept H0: f1=f2
#> [1] 0.9220685
data(crabs, package="MASS")
x1 <- crabs[crabs$sp=="B", c(4,6)]
x2 <- crabs[crabs$sp=="O", c(4,6)]
kde.test(x1=x1, x2=x2)$pvalue ## reject H0: f1=f2
#> [1] 4.235639e-160