Kernel mean shift clustering
kms.RdKernel mean shift clustering for 2- to 6-dimensional data.
Arguments
- x
matrix of data values or object of class
kms- y
matrix of candidate data values for which the mean shift will estimate their cluster labels. If missing,
y=x.- H
bandwidth matrix/scalar bandwidth. If missing,
Hpi(x,deriv.order=1,nstage=2-(d>2))is called by default.- max.iter
maximum number of iterations. Default is 400.
- tol.iter
distance under which two successive iterations are considered convergent. Default is 0.001*min marginal IQR of
x.- tol.clust
distance under which two cluster modes are considered to form one cluster. Default is 0.01*max marginal IQR of
x.- min.clust.size
minimum cluster size (cardinality). Default is
0.01*nrow(y).- merge
flag to merge clusters which are smaller than
min.clust.size. Default is TRUE.- keep.path
flag to store the density gradient ascent paths. Default is FALSE.
- verbose
flag to print out progress information. Default is FALSE.
- object
object of class
kms- display
type of display, "splom" (>=2-d) or "plot3D" (3-d)
- col,col.fun
vector or colours (one for each group) or colour function
- alpha
colour transparency. Default is 1.
- xlab,ylab,zlab
axes labels
- theta,phi
graphics parameters for perspective plots (3-d)
- add
flag to add to current plot. Default is FALSE.
- ...
other (graphics) parameters
Value
A kernel mean shift clusters set is an object of class kms
which is a list with fields:
- x,y
data points - same as input
- end.points
matrix of final iterates starting from
y- H
bandwidth matrix
- label
vector of cluster labels
- nclust
number of clusters
- nclust.table
frequency table of cluster labels
- mode
matrix of cluster modes
- names
variable names
- tol.iter,tol.clust,min.clust.size
tuning parameter values - same as input
- path
list of density gradient ascent paths where
path[[i]]is the path ofy[i,](ifkeep.path=TRUE)
Details
Mean shift clustering belongs to the class of modal or density-based clustering methods. The mean shift recurrence of the candidate point \({\bold x}\) is \({\bold x}_{j+1} = {\bold x}_j + \bold{{\rm H}} {\sf D} \hat{f}({\bold x}_j)/\hat{f}({\bold x}_j)\) where \(j\geq 0\) and \({\bold x}_0 = {\bold x}\). The sequence \(\{{\bold x}_0, {\bold x}_1, \dots \}\) follows the density gradient ascent paths to converge to a local mode of the density estimate \(\hat{f}\). Hence \({\bold x}\) is iterated until it converges to its local mode, and this determines its cluster label.
The mean shift recurrence is terminated if successive iterations are
less than tol.iter or the maximum number of iterations
max.iter is reached. Final iterates which are less than
tol.clust distance apart are considered to form a single
cluster. If merge=TRUE then the clusters whose cardinality is less
than min.clust.size are iteratively merged with their nearest cluster.
If the bandwidth H is missing, then
the default bandwidth is the plug-in selector for the density gradient
Hpi(x,deriv.order=1). Any bandwidth that is suitable for the
density gradient is also suitable for the mean shift.
References
Chacon, J.E. & Duong, T. (2013) Data-driven density estimation, with applications to nonparametric clustering and bump hunting. Electronic Journal of Statistics 7, 499–532.
Comaniciu, D. & Meer, P. (2002) Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 603–619.
Examples
data(crabs, package="MASS")
kms.crabs <- kms(x=crabs[,c("FL","CW")])
plot(kms.crabs, pch=16)
summary(kms.crabs)
#> Number of clusters = 4
#> Cluster label table = 49 51 57 43
#> Cluster modes =
#> FL CW
#> 1 13.02325 32.01357
#> 2 14.90668 36.59668
#> 3 20.97492 53.83624
#> 4 14.66395 32.84520
kms.crabs <- kms(x=crabs[,c("FL","CW","RW")])
plot(kms.crabs, pch=16)
plot(kms.crabs, display="plot3D", pch=16)