Bibliographic Component Extraction Using Support Vector Machines and Hidden Markov Models

Okada, Takashi; Takasu, Atsuhiro; Adachi, Jun

doi:10.1007/978-3-540-30230-8_46

Takashi Okada¹⁷,
Atsuhiro Takasu¹⁸ &
Jun Adachi¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3232))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

634 Accesses
6 Citations
3 Altmetric

Abstract

Article citations are composed of subfields such as author, title, journal, and year. It is useful to automatically identify attributes of these subfields, since they are used for linking a citation with the actual cited article. In this article, we employ a Support Vector Machine (SVM), a method of machine learning, to automatically identify subfields. We then employ a Hidden Markov Model (HMM) to improve the identification accuracy. Information from the subfields identified by the SVM, and syntactic information analyzed by the HMM, are integrated to make an accurate identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aizawa, A., Takasu, A., Oyama, K., Adachi, J.: Record Linkage of Multi-source Databases: Research Trends. NII Journal 8, 43–51 (2004) (in Japanese)
Google Scholar
Lawrence, S., Giles, C.L., Bollacker, K.: Digital Libraries and Autonomous Citation Indexing. IEEE Computer 32(6), 67–71 (1999)
Google Scholar
Itho, T., Horibe, S., Shimbo, M., Matsumoto, Y.: Citation Indexing using Many Similarity Measures. IPSJ SIG Technical Report, 2003-DBS-130 181-188 (2003) (in Japanese)
Google Scholar
Takasu, A.: Probabilistic Interpage Analysis for Article Extraction from Document Images. In: Proc. of International Conference on Pattern Recognition, pp. 932–935 (1998)
Google Scholar
Takasu, A., et al.: Approximate Matching for OCR-processed Bibliographic Data. In: Proc. Intl. Conf. on Pattern Recognition (13th ICPR, pp. 175–179 (1996)
Google Scholar
Takasu, A.: Bibliographic Attribute Extraction from Erroneous References Based on Statistical Model. In: Proc. of 3rd ACM & IEEE Joint Conference on Digital Libraries (JCDL 2003), pp. 49–60 (2003)
Google Scholar
Ristad, E.S., Yianilos, P.N.: Learning String Edit Distance. IEEE Trans. Patt. Anal. and Mach. Intellig. 20(2), 522–532 (1998)
Article Google Scholar
Bilenko, M., Mooney, R.J.: Adaptive Duplicate Detection Using Learnable String Similarity Measures. In: Proc. 9th ACM Intl. Conf. on Knowledge Discovery and Data Mining, pp. 39–48 (2003)
Google Scholar
Ayres, F.H., Huggill, J.A.W., Yannakoudakis, E.J.: The universal standard bibliographic code (USBC): its use for clearing, merging and controlling large databases. Program - Automated Library and Information Systems 22(2), 117–132 (1988)
Article Google Scholar
Hsu, C., Lin, C.: A comparison on methods for multi-class support vector machines. Technical report, National Taiwan University, Taiwan (2001)
Google Scholar
Kita, K.: Computation and Language, Vol. 4: Probabilistic Language Model. University of Tokyo Press (1999)
Google Scholar
Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Support Vector Learning, MIT-Press, Cambridge (1999)
Google Scholar
Mecab: http://chasen.org/~taku/software/mecab/

Download references

Author information

Authors and Affiliations

Information Science and Technology, Information and Communication Engineering, The University of Tokyo, 7-3-1 Bunkyo-ku, Tokyo, Japan
Takashi Okada
National Institute of Informatics, 2-1-2, Hitotsubashi, Chiyoda-ku, Tokyo, Japan
Atsuhiro Takasu & Jun Adachi

Authors

Takashi Okada
View author publications
You can also search for this author in PubMed Google Scholar
Atsuhiro Takasu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Adachi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

UKOLN, University of Bath, BA2 7AY, Bath, UK
Rachel Heery & Liz Lyon &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Okada, T., Takasu, A., Adachi, J. (2004). Bibliographic Component Extraction Using Support Vector Machines and Hidden Markov Models. In: Heery, R., Lyon, L. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2004. Lecture Notes in Computer Science, vol 3232. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30230-8_46

Download citation

DOI: https://doi.org/10.1007/978-3-540-30230-8_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23013-7
Online ISBN: 978-3-540-30230-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics