您的位置:首页 > 论文页面
一种基于多级空间值的真值发现算法
发表时间:2016-11-30 浏览量:1761 下载量:634
全部作者: | 刘心乔,申德荣,于戈,聂铁铮,寇月 |
作者单位: | 东北大学计算机科学与工程学院 |
摘 要: | 可靠性存在差异的数据源描述同一个实体属性的数据值可能存在冲突。已有解决数据冲突的方法均把数据项值作为一个整体考虑,忽略了每级值之间的差异性和独立性。针对这种现象,本研究给出多级空间值的定义,并提出一种专门处理多级空间值的基于贝叶斯分析的真值发现算法。结合级与级之间的差异性和独立性,将数据项值进行分级处理,并基于分级计算的Vote 值选出真实值。此外,首次提出根据来源提供值的多级信息与真实值的相似度计算来源准确率,进行新一轮的迭代。最后,通过在真实数据集上的实验说明了该算法能有效提高真值发现的准确率和计算效率。 |
关 键 词: | 数据库;真值发现;冲突数据;多级空间 |
Title: | A multi-space-value-based truth discovery algorithm |
Author: | LIU Xinqiao, SHEN Derong, YU Ge, NIE Tiezheng, KOU Yue |
Organization: | College of Computer Science and Engineering, Northeastern University |
Abstract: | Web sources with different reliabilities may cause conflicts when they try to describe the same entity attributes. Existing conflict resolution methods treat a data item value as a whole, ignoring the differences and independence between each level. In order to solve this phenomenon, in this paper, we first define multi-space value, and then put forward a new truth discovery algorithm based on Bayesian analysis which is aimed at processing multi-level space values. Within the combination of the diversity and individualism, the data items are split into several levels, and then we can pick the truth value based on the Vote value computed by level calculation. And for the first time we compute the accuracy of the source according to the similarities between the multi-information proposed by the source and truth, which causes a new iteration. The experimental results on real-world datasets show that the proposed algorithm can effectively improve the precision and efficiency of truth discovery. |
Key words: | database; truth discovery; conflict data; multi-level space |
发表期数: | 2016年11月第22期 |
引用格式: | 刘心乔,申德荣,于戈,等. 一种基于多级空间值的真值发现算法[J]. 中国科技论文在线精品论文,2016,9(22):2319-2326. |

请您登录
暂无评论