您的位置:首页  > 论文页面

基于NLP的北京中轴线设计因子提取研究

发表时间:2022-03-31  浏览量:534  下载量:58
全部作者: 王金龙,孙炜
作者单位: 北京邮电大学数字媒体与设计艺术学院
摘 要: 为了更好地挖掘北京中轴线蕴含的丰富设计元素,提出一种基于自然语言处理的设计因子提取方法。首先利用爬虫爬取与北京中轴线及其南北延长线上各重要地标相关的语料数据,然后利用jieba对爬取的语料数据进行分词并删除停用词。获得分词结果后,利用词频-逆文档频率(term frequency-inverse document frequency,TF-IDF)技术提取各地标的关键词及其权重,选取其中权重较高的关键词作为该地标的主题词。利用词语间的相似度对各地标的主题词进行聚类,并根据聚类结果提取语义因子。提取的大致流程包括首先要求参与者在充分阅读相关材料后通过投票排除无意义的聚类结果,然后利用卡片分类法将聚类结果进行合并,最后要求参与者用感性词汇对卡片分类结果进行恰当命名以获得各地标的语义因子。根据各地标的语义因子编制语义差异法问卷,邀请参与者对各地标的典型颜色进行打分以筛选与之匹配的色彩因子。最终提取出22个地标共计64个语义因子及22个色彩因子。提取的语义因子及色彩因子能够很好地反映各地标的内涵语义及其外延的风格特征,为未来北京中轴线的相关设计提供了设计元素。
关 键 词: 土木建筑工程其他学科;北京中轴线;设计因子;自然语言处理;语义差异法
Title: Extraction of design factors of Beijing’s central axis based on NLP
Author: WANG Jinlong, SUN Wei
Organization: School of Digital Media & Design Arts, Beijing University of Posts and Telecommunications
Abstract: In order to better explore the rich design elements contained in the Beijing’s central axis, a design factor extraction method based on natural language processing is proposed. Firstly, the corpus data related to the important landmarks on the central axis of Beijing and its north-south extension line is crawled by crawler, and then jieba is used to segment the crawled corpus data and delete stop words. After the word segmentation results are obtained, the keywords and their weights of each landmark are extracted by using term frequency-inverse document frequency (TF-IDF) technology, and the keywords with higher weights are selected as the subject words of the landmark. The subject words of each landmark are clustered by the similarity between words, and semantic factors are extracted according to the clustering results. The general process of semantic factors extraction includes first asking participants to eliminate meaningless clustering results by voting after fully reading the relevant materials, then using the card sorting method to merge the clustering results, and finally asking the participants to name the card sorting results appropriately with perceptual vocabulary to obtain the semantic factors of each landmark. According to the semantic factors of each landmark, a semantic difference method questionnaire is prepared, and then the participants are invited to score the typical colors of each landmark to filter out the matching color factors. Finally, 22 landmarks are extracted with a total of 64 semantic factors and 22 color factors. The extracted semantic factors and color factors can well reflect the connotative semantics of each landmark and their extended style characteristics, providing design elements for the future design of Beijing’s central axis.
Key words: other subjects of civil and architectural engineering; Beijing’s central axis; design factor; natural language processing; semantic difference method
发表期数: 2022年3月第1期
引用格式: 王金龙,孙炜. 基于NLP的北京中轴线设计因子提取研究[J]. 中国科技论文在线精品论文,2022,15(1):66-72.
 
6 评论数 0
暂无评论
友情链接