Library and Information Science 56: 43-63 (2006)

原著論文Original Article

日本語学術論文PDFファイルの自動判定Automatic identification of academic articles in Japanese PDF files

受付日:2006年5月15日Received: May 15, 2006
受理日:2006年9月4日Accepted: September 4, 2006
発行日:2007年1月25日Published: January 25, 2007


As open-access policies gain acceptance, an increasing number of researchers are contributing their papers to publicly accessible web sites (i.e. self-archiving). Theoretically, these papers are accessible from standard search engines, but they tend to be obscured by other contents on the web. The purpose of this research is to develop a system that can automatically detect academic articles and/or quasi-academic articles on the web. This paper describes experiments that were conducted on the performance of various classifiers and the results are compared in terms of precision, recall, and F-measure. The classifiers use attributes such as terms in PDF files and empirical rules. The results suggest the efficiency of a ranked output system which has several phases to identify academic articles.

