Skip to main content

Bibliographic Component Extraction Using Support Vector Machines and Hidden Markov Models

  • Conference paper
Research and Advanced Technology for Digital Libraries (ECDL 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3232))

Included in the following conference series:

Abstract

Article citations are composed of subfields such as author, title, journal, and year. It is useful to automatically identify attributes of these subfields, since they are used for linking a citation with the actual cited article. In this article, we employ a Support Vector Machine (SVM), a method of machine learning, to automatically identify subfields. We then employ a Hidden Markov Model (HMM) to improve the identification accuracy. Information from the subfields identified by the SVM, and syntactic information analyzed by the HMM, are integrated to make an accurate identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aizawa, A., Takasu, A., Oyama, K., Adachi, J.: Record Linkage of Multi-source Databases: Research Trends. NII Journal 8, 43–51 (2004) (in Japanese)

    Google Scholar 

  2. Lawrence, S., Giles, C.L., Bollacker, K.: Digital Libraries and Autonomous Citation Indexing. IEEE Computer 32(6), 67–71 (1999)

    Google Scholar 

  3. Itho, T., Horibe, S., Shimbo, M., Matsumoto, Y.: Citation Indexing using Many Similarity Measures. IPSJ SIG Technical Report, 2003-DBS-130 181-188 (2003) (in Japanese)

    Google Scholar 

  4. Takasu, A.: Probabilistic Interpage Analysis for Article Extraction from Document Images. In: Proc. of International Conference on Pattern Recognition, pp. 932–935 (1998)

    Google Scholar 

  5. Takasu, A., et al.: Approximate Matching for OCR-processed Bibliographic Data. In: Proc. Intl. Conf. on Pattern Recognition (13th ICPR, pp. 175–179 (1996)

    Google Scholar 

  6. Takasu, A.: Bibliographic Attribute Extraction from Erroneous References Based on Statistical Model. In: Proc. of 3rd ACM & IEEE Joint Conference on Digital Libraries (JCDL 2003), pp. 49–60 (2003)

    Google Scholar 

  7. Ristad, E.S., Yianilos, P.N.: Learning String Edit Distance. IEEE Trans. Patt. Anal. and Mach. Intellig. 20(2), 522–532 (1998)

    Article  Google Scholar 

  8. Bilenko, M., Mooney, R.J.: Adaptive Duplicate Detection Using Learnable String Similarity Measures. In: Proc. 9th ACM Intl. Conf. on Knowledge Discovery and Data Mining, pp. 39–48 (2003)

    Google Scholar 

  9. Ayres, F.H., Huggill, J.A.W., Yannakoudakis, E.J.: The universal standard bibliographic code (USBC): its use for clearing, merging and controlling large databases. Program - Automated Library and Information Systems 22(2), 117–132 (1988)

    Article  Google Scholar 

  10. Hsu, C., Lin, C.: A comparison on methods for multi-class support vector machines. Technical report, National Taiwan University, Taiwan (2001)

    Google Scholar 

  11. Kita, K.: Computation and Language, Vol. 4: Probabilistic Language Model. University of Tokyo Press (1999)

    Google Scholar 

  12. Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Support Vector Learning, MIT-Press, Cambridge (1999)

    Google Scholar 

  13. Mecab: http://chasen.org/~taku/software/mecab/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Okada, T., Takasu, A., Adachi, J. (2004). Bibliographic Component Extraction Using Support Vector Machines and Hidden Markov Models. In: Heery, R., Lyon, L. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2004. Lecture Notes in Computer Science, vol 3232. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30230-8_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30230-8_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23013-7

  • Online ISBN: 978-3-540-30230-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics