您的位置:首页  > 论文页面

基于类别关键词搜索的移动应用商店Deep Web采集方法

发表时间:2018-02-28  浏览量:1129  下载量:174
全部作者: 汪鹭,胡阳雨,徐国爱
作者单位: 北京邮电大学网络空间安全学院
摘 要: 为在移动互联网大数据时代给移动应用分析、安全检测提供数据基础,解决大量应用信息隐藏在表单后的深层网页(Deep Web)问题,提出一种基于应用类别关键词搜索的采集方法,通过增量式爬取策略提高移动应用商店信息采集的完整率和补全效率。首先,基于垂直型爬虫获取可以跳转得到各类别应用界面的应用信息;然后利用TF-IDF算法从应用名称和描述信息中提取代表各类别应用的关键词;最后,使用基于关键词查询的采集方法进行增量式爬取。通过对10个覆盖十余种类别的移动应用商店进行实验分析,发现本方法具有很高的应用信息采集完整率和补全效率。
关 键 词: 计算机应用;Deep Web;TF-IDF 算法;增量式爬取
Title: A Deep Web collection method for mobile application store based on category keywords query
Author: WANG Lu, HU Yangyu, XU Guoai
Organization: School of Cyberspace Security, Beijing University of Posts and Telecommunications
Abstract: To provide the data base for mobile application analysis and security detection in the era of mobile internet big data and solve the problem of a large amount of application information hiding in the form’s Deep Web, this paper proposes a collection method based on application category keywords query to improve the integrity and completion efficiency of the mobile application stores information collection by incremental crawling strategy. Firstly, we obtain application information from different application interfaces which can be reached by vertical crawler, then we extract the keywords from the application name and description information based on TF-IDF algorithm that can represent each category of applications. Finally, the keyword-based queries are used to crawl incrementally in Deep Web. This paper carries out an experimental analysis of 10 mobile application stores covering more than 10 categories and discovers that this collection method prossesses high effectively integrity and collection efficiency of application information collection.
Key words: computer applications; Deep Web; TF-IDF algorithm; incremental crawling
发表期数: 2018年2月第4期
引用格式: 汪鹭,胡阳雨,徐国爱. 基于类别关键词搜索的移动应用商店Deep Web采集方法[J]. 中国科技论文在线精品论文,2018,11(4):348-356.
 
0 评论数 0
暂无评论
友情链接