Abstract
Article citations are composed of subfields such as author, title, journal, and year. It is useful to automatically identify attributes of these subfields, since they are used for linking a citation with the actual cited article. In this article, we employ a Support Vector Machine (SVM), a method of machine learning, to automatically identify subfields. We then employ a Hidden Markov Model (HMM) to improve the identification accuracy. Information from the subfields identified by the SVM, and syntactic information analyzed by the HMM, are integrated to make an accurate identification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aizawa, A., Takasu, A., Oyama, K., Adachi, J.: Record Linkage of Multi-source Databases: Research Trends. NII Journal 8, 43–51 (2004) (in Japanese)
Lawrence, S., Giles, C.L., Bollacker, K.: Digital Libraries and Autonomous Citation Indexing. IEEE Computer 32(6), 67–71 (1999)
Itho, T., Horibe, S., Shimbo, M., Matsumoto, Y.: Citation Indexing using Many Similarity Measures. IPSJ SIG Technical Report, 2003-DBS-130 181-188 (2003) (in Japanese)
Takasu, A.: Probabilistic Interpage Analysis for Article Extraction from Document Images. In: Proc. of International Conference on Pattern Recognition, pp. 932–935 (1998)
Takasu, A., et al.: Approximate Matching for OCR-processed Bibliographic Data. In: Proc. Intl. Conf. on Pattern Recognition (13th ICPR, pp. 175–179 (1996)
Takasu, A.: Bibliographic Attribute Extraction from Erroneous References Based on Statistical Model. In: Proc. of 3rd ACM & IEEE Joint Conference on Digital Libraries (JCDL 2003), pp. 49–60 (2003)
Ristad, E.S., Yianilos, P.N.: Learning String Edit Distance. IEEE Trans. Patt. Anal. and Mach. Intellig. 20(2), 522–532 (1998)
Bilenko, M., Mooney, R.J.: Adaptive Duplicate Detection Using Learnable String Similarity Measures. In: Proc. 9th ACM Intl. Conf. on Knowledge Discovery and Data Mining, pp. 39–48 (2003)
Ayres, F.H., Huggill, J.A.W., Yannakoudakis, E.J.: The universal standard bibliographic code (USBC): its use for clearing, merging and controlling large databases. Program - Automated Library and Information Systems 22(2), 117–132 (1988)
Hsu, C., Lin, C.: A comparison on methods for multi-class support vector machines. Technical report, National Taiwan University, Taiwan (2001)
Kita, K.: Computation and Language, Vol. 4: Probabilistic Language Model. University of Tokyo Press (1999)
Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Support Vector Learning, MIT-Press, Cambridge (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Okada, T., Takasu, A., Adachi, J. (2004). Bibliographic Component Extraction Using Support Vector Machines and Hidden Markov Models. In: Heery, R., Lyon, L. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2004. Lecture Notes in Computer Science, vol 3232. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30230-8_46
Download citation
DOI: https://doi.org/10.1007/978-3-540-30230-8_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23013-7
Online ISBN: 978-3-540-30230-8
eBook Packages: Springer Book Archive