您的位置:首页  > 论文页面

加入三角不等式的闭合K-means改进算法

发表时间:2019-06-28  浏览量:159  下载量:15
全部作者: 石康壮,陈戟
作者单位: 武汉理工大学光纤传感技术与信息处理教育部重点实验室;休斯顿大学卡伦工程学院
摘 要: 研究介绍了加入三角不等式的闭合K-means改进算法,它显著提高了传统K-means算法的计算效率。改进方法主要考虑到传统的K-means重分配过程中的绝大数计算过程是非必要的,因为算法需要找到最近的聚类中心点,却不关心数据点与其他聚类中心的距离。这种改进思路是基于Kd tree with BBF算法的精神拓展的,本文提到的与它有类似之处。先通过三角不等式确定每个聚类中心不需要计算的点的范围,再通过使用多个随机空间分区树将数据组成邻居点来有效识别那些易于“移动”的点。实验结果表明,该方法在计算大型视觉词汇数据集时比传统算法有更高的计算效率,大约减少了41%的计算时间。研究还评估了一些参数的影响,包括总的数据量、聚类数量、阈值大小,证明了不同的参数选取可以有效减少时间消耗,但代价是聚类的效果会降低。
关 键 词: 计算机应用;K-means;算法;三角不等式原理;随机树
Title: Closed K-means improved algorithm with triangular inequality
Author: SHI Kangzhuang, CHEN Ji
Organization: Key Laboratory of Fiber Optic Sensing Technology and Information Processing, Ministry of Education, Wuhan University of Technology; Cullen College of Engineering, University of Houston
Abstract: A closed K-means improved algorithm with triangular inequality is introduced in this paper, which significantly improves the computational efficiency of the traditional K-means algorithm. The improved method mainly considers that the vast majority of the calculation process in the traditional K-means redistribution process is unnecessary, because the algorithm needs to find the nearest cluster center point, but does not care about the distance between the data points and other cluster centers. This improvement way is expanded based on the spirit of the Kd tree with BBF algorithm, which is similar to the mentioned method in this paper. Firstly, the triangle inequality is used to determine the range of points that each cluster center does not need to calculate. Then the data is formed into neighbor points by using multiple random space partition trees to effectively identify those points that are easy to “move”. The experimental results show that this method has higher computational efficiency than the traditional algorithm in computing large visual vocabulary datasets, and reduces the computation time by about 41%. This paper also evaluates the impact of some parameters, including the total amount of data, the number of clusters, and the threshold size. They have been shown that different parameter selection can effectively reduce time consumption, but at the cost of reducing the clustering effect.
Key words: computer applications; K-means; algorithm; principle of triangular inequality; random tree
发表期数: 2019年6月第3期
引用格式: 石康壮,陈戟. 加入三角不等式的闭合K-means改进算法[J]. 中国科技论文在线精品论文,2019,12(3):397-404.
 
1 评论数 0
暂无评论
友情链接