文書クラスタリングの技法―文献レビュー

岸田和明; Kazuaki Kishida

doi:10.46895/lis.49.33

Library and Information Science

Library and Information Science ISSN: 2435-8495

三田図書館・情報学会 Mita Society for Library and Information Science

〒108‒8345 東京都港区三田2‒15‒45 慶應義塾大学文学部図書館・情報学専攻内 c/o Keio University, 2-15-45 Mita, Minato-ku, Tokyo 108-8345, Japan
https://mslis.jp/ E-mail:mita-slis@ml.keio.jp

Library and Information Science 49: 33-75 (2003)
doi:10.46895/lis.49.33

展望論文Review article

文書クラスタリングの技法―文献レビューTechniques of document clustering: A review

岸田和明Kazuaki Kishida

駿河台大学文化情報学部Surugadai University ◇ 〒357-0046 埼玉県飯能市阿須698番地 ◇ Azu 698, Hanno, Saitama 357-8555, Japan

受付日：2004年2月25日Received: February 25, 2004

受理日：2004年8月19日Accepted: August 19, 2004

発行日：2004年11月15日Published: November 15, 2004

PDF

The document clustering technique is widely recognized as a useful tool for information retrieval, organizing web documents, text mining and so on. The purpose of this paper is to review various document clustering techniques, and to discuss research issues for enhancing effectiveness or efficiency of the clustering methods. We explore extensive literature on non-hierarchical methods (single-pass methods), hierarchical methods (single-link, complete-link, etc.), dimensional reduction methods (LSI, principal component analysis, etc.), probabilistic methods, data mining techniques, and so on. In particular, this paper focuses on typical techniques, such as the k-means algorithm, the leader-follower algorithm, self-organizing map (SOM), single- or complete-link methods, bisecting k-means methods, latent semantic indexing (LSI), Gaussian-Mixture model and so on. After reviewing the techniques and algorithms, we discuss research issues on document clustering; computational complexity, feature extraction (selection of words), methods for defining term weights and similarity, and evaluation of results.

This page was created on 2021-01-18T17:41:10.82+09:00
This page was last modified on

このサイトは（株）国際文献社によって運用されています。