Skip to main content

SVM Based Learning System for Information Extraction

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3635))

Abstract

This paper presents an SVM-based learning system for information extraction (IE). One distinctive feature of our system is the use of a variant of the SVM, the SVM with uneven margins, which is particularly helpful for small training datasets. In addition, our approach needs fewer SVM classifiers to be trained than other recent SVM-based systems. The paper also compares our approach to several state-of-the-art systems (including rule learning and statistical learning algorithms) on three IE benchmark datasets: CoNLL-2003, CMU seminars, and the software jobs corpus. The experimental results show that our system outperforms a recent SVM-based system on CoNLL-2003, achieves the highest score on eight out of 17 categories on the jobs corpus, and is second best on the remaining nine.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bender, O., Och, F.J., Ney, H.: Maximum entropy models for named entity recognition. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL 2003, Edmonton, Canada, pp. 148–151 (2003)

    Google Scholar 

  2. Califf, M.E.: Relational learning techniques for natural language information extraction. PhD thesis, University of Texas at Austin (1998)

    Google Scholar 

  3. Chieu, H.L., Ng, H.T.: A Maximum Entropy Approach to Information Extraction from Semi-Structured and Free Text. In: Proceedings of the Eighteenth National Conference on Artificial Intelligence, pp. 786–791 (2002)

    Google Scholar 

  4. Chieu, H.L., Ng, H.T.: Named entity recognition: A maximum entropy approach using global information. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan (2002)

    Google Scholar 

  5. Chieu, H.L., Ng, H.T.: Named entity recognition with a maximum entropy approach. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL 2003, Edmonton, Canada, pp. 160–163 (2003)

    Google Scholar 

  6. Cimiano, P., Handschuh, S., Staab, S.: Towards the self-Annotating Web. In: Proceedings of WWW 2004 (2004)

    Google Scholar 

  7. Ciravegna, F.: (LP)2, an adaptive algorithm for information extraction from web related texts. In: Proceedings of the IJCAI 2001 Workshop on Adaptive Text Extraction and Mining, Seattle (2001)

    Google Scholar 

  8. Ciravegna, F. (LP)2, Rule Induction for Information Extraction Using Linguistic Constraints. Technical Report CS-03-07, Department of Computer Science, University of Sheffield, Sheffield (September 2003)

    Google Scholar 

  9. Ciravegna, F., Wilks, Y.: Designing adaptive information extraction for the semantic Web in Amilcare. In: Handschuh, S., Staab, S. (eds.) Annotation for the Semantic Web. IOS Press, Amsterdam (2003)

    Google Scholar 

  10. Cunningham, H.: Information extraction, automatic. Encyclopedia of Language and Linguistics, 2nd edn. (2005)

    Google Scholar 

  11. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, ACL 2002 (2002)

    Google Scholar 

  12. Curran, J.R., Clark, S.: Language independent NER using a maximum entropy tagger. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL 2003, Edmonton, Canada, pp. 164–167 (2003)

    Google Scholar 

  13. Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL 2003, Edmonton, Canada, pp. 168–171 (2003)

    Google Scholar 

  14. Freigtag, D., McCallum, A.K.: Information extraction with HMMs and shrinkage. In: Proceedings of Workshop on Machine Learning for Information Extraction, pp. 31–36 (1999)

    Google Scholar 

  15. Freitag, D.: Information extraction from html: Application of a general learning approach. In: Proceedings of the Fifteenth Conference on Artificial Intelligence AAAI 1998, pp. 517–523 (1998)

    Google Scholar 

  16. Freitag, D.: Machine Learning for Information Extraction in Informal Domains. PhD thesis, Carnegie Mellon University (1998)

    Google Scholar 

  17. Freitag, D.: Machine Learning for Information Extraction in Informal Domains. Machine Learning 39, 169–202 (2000)

    Article  MATH  Google Scholar 

  18. Freitag, D., Kushmerick, N.: Boosted Wrapper Induction. In: Proceedings of AAAI 2000 (2000)

    Google Scholar 

  19. Isozaki, H., Kazawa, H.: Efficient Support Vector Classifiers for Named Entity Recognition. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan, pp. 390–396 (2002)

    Google Scholar 

  20. Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)

    Google Scholar 

  21. Li, Y., Shawe-Taylor, J.: The SVM with uneven margins and Chinese document categorization. In: Proceedings of The 17th Pacific Asia Conference on Language, Information and Computation (PACLIC17), Singapore, October 2003, pp. 216–227 (2003)

    Google Scholar 

  22. Mayfield, J., McNamee, P., Piatko, C.: Named entity recognition using hundreds of thousands of features. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL 2003, Edmonton, Canada, pp. 184–187 (2003)

    Google Scholar 

  23. Roth, D., Yih, W.T.: Relational learning via propositional algorithms: an information extraction case study. In: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI 2001), pp. 1257–1263 (2001)

    Google Scholar 

  24. SAIC. Proceedings of the Seventh Message Understanding Conference, MUC-7 (1998), http://www.itl.nist.gov/iaui/894.02/relatedprojects/muc/index.html

  25. Sang, E.F., Meulder, F.D.: Introduction to the CoNLL 2003 shared task: language-independent named entity recognition. In: Daelemans, W., Osborne, M. (eds.) Proceedings of CoNLL 2003, Edmonton, Canada, pp. 142–147 (2003)

    Google Scholar 

  26. Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning 34, 233–272 (1999)

    Article  MATH  Google Scholar 

  27. Song, Y., Yi, E., Kim, E., Lee, G.G.: POSBIOTM-NER: a machine learning approach for bio-named entity recognition. In: Workshop on a critical assessment of text mining methods in molecular biology, Granada, Spain (2004), http://www.pdg.cnb.uam.es/BioLINK/workshopBioCreative04/

  28. Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, Y., Bontcheva, K., Cunningham, H. (2005). SVM Based Learning System for Information Extraction. In: Winkler, J., Niranjan, M., Lawrence, N. (eds) Deterministic and Statistical Methods in Machine Learning. DSMML 2004. Lecture Notes in Computer Science(), vol 3635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11559887_19

Download citation

  • DOI: https://doi.org/10.1007/11559887_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29073-5

  • Online ISBN: 978-3-540-31728-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics