Kernel discriminant analysis (kernel classification)
kda.RdKernel discriminant analysis (kernel classification) for 1- to d-dimensional data.
Usage
kda(x, x.group, Hs, hs, prior.prob=NULL, gridsize, xmin, xmax, supp=3.7,
eval.points, binned, bgridsize, w, compute.cont=TRUE, approx.cont=TRUE,
kde.flag=TRUE)
Hkda(x, x.group, Hstart, bw="plugin", ...)
Hkda.diag(x, x.group, bw="plugin", ...)
hkda(x, x.group, bw="plugin", ...)
# S3 method for class 'kda'
predict(object, ..., x)
compare(x.group, est.group, by.group=FALSE)
compare.kda.cv(x, x.group, bw="plugin", prior.prob=NULL, Hstart, by.group=FALSE,
verbose=FALSE, recompute=FALSE, ...)
compare.kda.diag.cv(x, x.group, bw="plugin", prior.prob=NULL, by.group=FALSE,
verbose=FALSE, recompute=FALSE, ...)Arguments
- x
matrix of training data values
- x.group
vector of group labels for training data
- Hs,hs
(stacked) matrix of bandwidth matrices/vector of scalar bandwidths. If these are missing,
Hkdaorhkdais called by default.- prior.prob
vector of prior probabilities
- gridsize
vector of grid sizes
- xmin,xmax
vector of minimum/maximum values for grid
- supp
effective support for standard normal
- eval.points
vector or matrix of points at which estimate is evaluated
- binned
flag for binned estimation
- bgridsize
vector of binning grid sizes
- w
vector of weights. Not yet implemented.
- compute.cont
flag for computing 1% to 99% probability contour levels. Default is TRUE.
- approx.cont
flag for computing approximate probability contour levels. Default is TRUE.
- kde.flag
flag for computing KDE on grid. Default is TRUE.
- object
object of class
kda- bw
bandwidth: "plugin" = plug-in, "lscv" = LSCV, "scv" = SCV
- Hstart
(stacked) matrix of initial bandwidth matrices, used in numerical optimisation
- est.group
vector of estimated group labels
- by.group
flag to give results also within each group
- verbose
flag for printing progress information. Default is FALSE.
- recompute
flag for recomputing the bandwidth matrix after excluding the i-th data item
- ...
other optional parameters for bandwidth selection, see
Hpi,Hlscv,Hscv
Value
–For kde.flag=TRUE, a kernel discriminant analysis is an object of class kda which is a list with fields
- x
list of data points, one for each group label
- estimate
list of density estimates at
eval.points, one for each group label- eval.points
vector or list of points that the estimate is evaluated at, one for each group label
- h
vector of bandwidths (1-d only)
- H
stacked matrix of bandwidth matrices or vector of bandwidths
- gridded
flag for estimation on a grid
- binned
flag for binned estimation
- w
vector of weights
- prior.prob
vector of prior probabilities
- x.group
vector of group labels - same as input
- x.group.estimate
vector of estimated group labels. If the test data
eval.pointsare given then these are classified. Otherwise the training dataxare classified.
For kde.flag=FALSE, which is always the case for d > 3,
then only the vector of estimated group labels is returned.
–The result from Hkda and Hkda.diag is a stacked matrix
of bandwidth matrices, one for each training data group. The result
from hkda is a vector of bandwidths, one for each training group.
–The compare functions create a comparison between the true
group labels x.group and the estimated ones.
It returns a list with fields
- cross
cross-classification table with the rows indicating the true group and the columns the estimated group
- error
misclassification rate (MR)
In the case where the test data are independent of the
training data, compare computes MR = (number of points wrongly
classified)/(total number of points). In the case where the test data
are not independent e.g.
we are classifying the training data set itself, then the cross
validated estimate of MR is more appropriate. These
are implemented as compare.kda.cv (unconstrained bandwidth
selectors) and compare.kda.diag.cv (for diagonal bandwidth
selectors). These functions are only available for d > 1.
If by.group=FALSE then only the total MR rate is given. If it
is set to TRUE, then the MR rates for each class are also given
(estimated number in group divided by true number).
Details
If the bandwidths Hs are missing from kda, then the
default bandwidths are the plug-in selectors Hkda(bw="plugin").
Likewise for missing hs. Valid options for bw
are "plugin", "lscv" and "scv" which in turn call
Hpi, Hlscv and Hscv.
The effective support, binning, grid size, grid range, positive
parameters are the same as kde.
If prior probabilities are known then set prior.prob to these.
Otherwise prior.prob=NULL uses the sample
proportions as estimates of the prior probabilities.
For ks \(\geq\) 1.8.11, kda.kde has been subsumed
into kda, so all prior calls to kda.kde can be replaced
by kda. To reproduce the previous behaviour of kda, the
command is kda(kde.flag=FALSE).
Examples
set.seed(8192)
x <- c(rnorm.mixt(n=100, mus=1), rnorm.mixt(n=100, mus=-1))
x.gr <- rep(c(1,2), times=c(100,100))
y <- c(rnorm.mixt(n=100, mus=1), rnorm.mixt(n=100, mus=-1))
y.gr <- rep(c(1,2), times=c(100,100))
kda.gr <- kda(x, x.gr)
y.gr.est <- predict(kda.gr, x=y)
compare(y.gr, y.gr.est)
#> $cross
#> 1 (est.) 2 (est.) Total
#> 1 (true) 85 15 100
#> 2 (true) 14 86 100
#> Total 99 101 200
#>
#> $error
#> [1] 0.145
#>
## See other examples in ? plot.kda