Library and Information Science

Library and Information Science ISSN: 2435-8495
三田図書館・情報学会
〒108‒8345 東京都港区三田2‒15‒45 慶應義塾大学文学部図書館・情報学専攻内
https://mslis.jp/ E-mail:mita-slis@ml.keio.jp
Library and Information Science 26: 67-88 (1988)
doi:10.46895/lis.26.67

原著論文

出現頻度情報に基づく単語重みづけの原理

東京大学大学院教育学研究科博士課程 ◇ 〒113-0033 東京都文京区本郷七丁目3番1号

受付日:1989年1月21日
発行日:1989年3月25日
PDF

Characteristics of the occurrence frequency of words in natural language texts have been used as an indicator for the selection of significant words in automatic indexing. This paper describes some general principles common to term weighting methods which use occurrence frequency measures.

For this purpose, nearly sixty weighting fomulas were collected from the documents published in the past thirty years. Then their theoretical characteristics were analyzed and compared with each other. As a result, these formulas were classified into following five categories. 1) absolute frequency measures, 2) two kinds of relative frequency measures, 3) word dispersion measures, 4) 2-Poisson model proposed by Harter, 5) information theory similar to the one proposed by Shannon.

Various mathematical relations peculiar to the formulas of each category were found. These relations were well explained by a model consisting of two kinds of word sets, one of which is subsumed by the other; that is, the significance of a word depended on the degree of its maldistribution to the subsumed word set.

This page was created on 2021-01-26T14:02:00.81+09:00
This page was last modified on


このサイトは(株)国際文献社によって運用されています。