您的位置:首页  > 论文页面

基于网址特征提取的布隆过滤器的设计与实现

发表时间:2016-05-31  浏览量:1654  下载量:766
全部作者: 卢诚知,赵天霖,范春晓
作者单位: 北京邮电大学电子工程学院,北京 100876,北京十一学校国际部高二AS2,北京 100039,北京邮电大学电子工程学院,北京 100876
摘 要: 针对电商网站网页实时分析过程对效率的特殊要求,分析传统布隆过滤器的原理,指出其网址查重中忽略了网址信息冗余的缺陷,提出一种改进的网址查重方法——基于网址特征提取的布隆过滤方法。该方法首先定义网址特征,并对其进行量化、提取,根据网址特征训练网址过滤规则,据此去除网址的冗余信息后对网址进行布隆查重。通过对200 多万条数据进行实验,发现改进后的布隆过滤器的时间效率有了很大提升,并且随着数据量的增加,时间效率提升更明显,证明了所提方法的有效性,并能很好地满足应用需求。
关 键 词: 电子、通信与自动控制技术其他学科;网址查重;布隆过滤器;特征提取
Title: Bloom filter design and implementation based on URL feature extraction
Author: LU Chengzhi, ZHAO Tianlin, FAN Chunxiao
Organization: School of Electronic Engineering, Beijing University of Posts and Communications; AS2, Grade11, International Department, Beijing National Day School
Abstract: Based on real-time analysis of e-commerce sites page for the special requirements of efficiency, we analyzed the principle of traditional bloom filter, pointed out its URL rechecking of ignoring the URL information redundancy, put forward an improved the method of the site: Bloom Filter method based on the site feature extraction. Firstly, this method defines the website features, and carries on the quantification, extraction, training according to the characters of web site URL filtering rules, on the basis of remove redundant information after URL rechecking with Bloom filter. Through more than 2 million experimental data, the improved Bloom filter efficiency had the very big promotion, and efficiency improvement is more obvious with the increase of amount of data time, proves that the proposed method is effective, and can well meet the application requirements.
Key words: electrics, communication and autocontrol technology; URL rechecking; Bloom filter; feature extration
发表期数: 2016年5月第10期
引用格式: 卢诚知,赵天霖,范春晓. 基于网址特征提取的布隆过滤器的设计与实现[J]. 中国科技论文在线精品论文,2016,9(10):1069-1074.
 
0 评论数 0
暂无评论
友情链接