Library and Information Science

Library and Information Science ISSN: 2435-8495
三田図書館・情報学会 Mita Society for Library and Information Science
〒108‒8345 東京都港区三田2‒15‒45 慶應義塾大学文学部図書館・情報学専攻内 c/o Keio University, 2-15-45 Mita, Minato-ku, Tokyo 108-8345, Japan
Library and Information Science 39: 31-45 (1998)

原著論文Original Article

図書をNDCカテゴリに分類する試みAn experiment of automatic classification of books using Nippon Decimal Classification

慶應義塾大学文学研究科図書館・情報学専攻Graduate School of Library and Information Science, Keio University ◇ 〒108-8345 東京都港区三田二丁目15番45号 ◇ Mita 2-15-45, Minato-ku, Tokyo 108-8345, Japan

受付日:1999年8月9日Received: August 9, 1999
受理日:1999年12月22日Accepted: December 22, 1999
発行日:2000年1月30日Published: January 30, 2000

In information retrieval, texts are usually retrieved by them with queries. In this study, an approach was suggested that texts are automatically classified into categories and retrieved by matching them with queries classified in the same way. For an efficient information retrieval using automatic classification, extracting methods of words from texts and matching methods are essential. Some extracting methods from Japanese texts have been suggested in natural languages processing. However, it is difficult to extract significant words from Japanese texts because Japanese texts are written without blank space separating words. As for matching methods, many weighting methods have been suggested as well as vector space models and probabilistic models.

This article reports the results of an experiment of classifying Japanese texts into Nippon Decimal Classification (NDC) categories based on the title information in Japanese MARC records. In this experiment, three extracting methods: — juman, MHSA, n-gram — are tested on a set of 1,000 books. Four weighting methods: — relative term frequency between categories, tf·idf and tf(max)·idf — are tested. The results indicate that the extracting method using juman achieved best and the best weighting method was the relative term frequency between categories, being able to select correct classification categories (upper three digits of NDC) for about 55.9% of 1,000 books.

This page was created on 2021-01-22T10:42:13.91+09:00
This page was last modified on