Library and Information Science

Library and Information Science ISSN: 2435-8495
三田図書館・情報学会 Mita Society for Library and Information Science
〒108‒8345 東京都港区三田2‒15‒45 慶應義塾大学文学部図書館・情報学専攻内 c/o Keio University, 2-15-45 Mita, Minato-ku, Tokyo 108-8345, Japan
Library and Information Science 56: 43-63 (2006)

原著論文Original Article

日本語学術論文PDFファイルの自動判定Automatic identification of academic articles in Japanese PDF files

1亜細亜大学Asia University ◇ 〒180-8629 東京都武蔵野市境五丁目4番10号 ◇ Sakai 5-24-10, Musashino-shi, Tokyo 180-8629, Japan

2大東文化大学Daito Bunka University ◇ 〒175-8571 東京都板橋区高島平一丁目9番1号 ◇ Takashimadaira 1-9-1, Itabashi-ku, Tokyo 175-8571, Japan

3駿河台大学Surugadai University ◇ 〒357-0046 埼玉県飯能市阿須698番地 ◇ Azu 698, Hanno-shi, Saitama 357-8555, Japan

4鉄道総合技術研究所Railway Technical Research Institute ◇ 〒185-8540 東京都国分寺市光町二丁目8番地38 ◇ Hikari-cho 2-8-38, Kokubunji-shi, Tokyo 85-8540, Japan

5作新学院大学Sakushingakuin University ◇ 〒321-3295 栃木県宇都宮市竹下町908番地 ◇ Takeshitamachi 908, Utsunomiya-shi, Tochigi 321-3295, Japan

6慶應義塾大学Keio University ◇ 〒108-8345 東京都港区三田二丁目15番45号 ◇ Mita 2-15-45, Minato-ku, Tokyo 108-8345, Japan

受付日:2006年5月15日Received: May 15, 2006
受理日:2006年9月4日Accepted: September 4, 2006
発行日:2007年1月25日Published: January 25, 2007


As open-access policies gain acceptance, an increasing number of researchers are contributing their papers to publicly accessible web sites (i.e. self-archiving). Theoretically, these papers are accessible from standard search engines, but they tend to be obscured by other contents on the web. The purpose of this research is to develop a system that can automatically detect academic articles and/or quasi-academic articles on the web. This paper describes experiments that were conducted on the performance of various classifiers and the results are compared in terms of precision, recall, and F-measure. The classifiers use attributes such as terms in PDF files and empirical rules. The results suggest the efficiency of a ranked output system which has several phases to identify academic articles.

This page was created on 2021-01-18T09:10:25.453+09:00
This page was last modified on