Skip to main content

Annotating Text Segments Using a Web-Based Categorization Approach

  • Conference paper
Digital Libraries: Implementing Strategies and Sharing Experiences (ICADL 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3815))

Included in the following conference series:

  • 1131 Accesses

Abstract

Conventional automatic text annotation tools mostly extract named entities from texts and annotate them with information about persons, locations, and dates, etc. Such kind of entity type information, however, is insufficient for machines to understand the context or facts contained in the texts. This paper presents a general text categorization approach to categorize text segments into broader subject categories, such as categorizing a text string into a category of paper title in Mathematics or a category of conference name in Computer Science. Experimental results confirm its wide applicability to various digital library applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Witten, I.H., et al.: Text Mining in a Digital Library. International Journal on Digital Libraries 4(1), 56–59 (2004)

    Article  Google Scholar 

  2. Zhou, G.D., Su, J.: Named Entity Recognition Using an HMM-based Chunk Tagger. In: Proceedings of the 40th Annual Meeting of the ACL, pp. 473–480 (2000)

    Google Scholar 

  3. Hearst, M.: Untangling Text Data Mining. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (1999)

    Google Scholar 

  4. Banko, M., Brill, E.: Scaling to Very Large Corpora for Natural Language Disambiguation. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, pp. 26–33 (2001)

    Google Scholar 

  5. Cohen, W., Singer, Y.: Context-sensitive Learning Methods for Text Categorization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 307–315 (2001)

    Google Scholar 

  6. Huang, C.C., Chuang, S.L., Chien, L.F.: LiveClassifier: Creating Hierarchical Text Classifiers through Web Corpora. In: Proceedings of the 2004 World Wide Web Conference, WWW 2004 (2004)

    Google Scholar 

  7. Kosala, R., Blockeel, H.: Web Mining Research: A Survey. ACM SIGKDD Explorations 2(1), 1–15 (2000)

    Article  Google Scholar 

  8. Feldman, R., et al.: Maximal Association Rules: A New Tool for Mining for Keyword Co-occurrences in Document Collections. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp. 167–170 (1997)

    Google Scholar 

  9. Soderland, S.: Learning Text Analysis Rules for Domain-specific Natural Language Processing. Ph.D. thesis, technical report UM-CS-1996-087 University of Massachusetts, Amherst (1997)

    Google Scholar 

  10. Agirre, E., Ansa, O., Hovy, E., Martinez, D.: Enriching Very Large Ontology Using the WWW. In: Proceedings of ECAI 2000 Workshop on Ontology Learning (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chiao, HC., Pu, HT., Chien, LF. (2005). Annotating Text Segments Using a Web-Based Categorization Approach. In: Fox, E.A., Neuhold, E.J., Premsmit, P., Wuwongse, V. (eds) Digital Libraries: Implementing Strategies and Sharing Experiences. ICADL 2005. Lecture Notes in Computer Science, vol 3815. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11599517_37

Download citation

  • DOI: https://doi.org/10.1007/11599517_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30850-8

  • Online ISBN: 978-3-540-32291-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics