Skip to main content

SIGNUM: A Graph Algorithm for Terminology Extraction

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4919))

Abstract

Terminology extraction is an essential step in several fields of natural language processing such as dictionary and ontology extraction. In this paper, we present a novel graph-based approach to terminology extraction. We use SIGNUM, a general purpose graph-based algorithm for binary clustering on directed weighted graphs generated using a metric for multi-word extraction. Our approach is totally knowledge-free and can thus be used on corpora written in any language. Furthermore it is unsupervised, making it suitable for use by non-experts. Our approach is evaluated on the TREC-9 corpus for filtering against the MESH and the UMLS vocabularies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press, Addison-Wesley (1999)

    Google Scholar 

  2. Bourigault, D.: Lexter: A terminology extraction software for knowledge acquisition from texts. In: 9th Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff, Canada (1995)

    Google Scholar 

  3. Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. In: Proceedings of the 27th. Annual Meeting of the Association for Computational Linguistics, Vancouver, B.C, pp. 76–83. Association for Computational Linguistics (1989)

    Google Scholar 

  4. Dagan, I., Church, K.: Termight: identifying and translating technical terminology. In: Proceedings of the fourth conference on Applied natural language processing, pp. 34–40. Morgan Kaufmann, San Francisco (1994)

    Chapter  Google Scholar 

  5. Dias, G.: Extraction Automatique dAssociations Lexicales partir de Corpora. PhD thesis, New University of Lisbon (Portugal) and LIFO University of Orléans (France), Lisbon, Portugal (2002)

    Google Scholar 

  6. Dice, L.R.: Measures of the amount of ecological association between species. Ecology 26, 297–302 (1945)

    Article  Google Scholar 

  7. Dorow, B.: A Graph Model for Words and their Meanings. PhD thesis, University of Stuttgart, Stuttgart, Germany (2006)

    Google Scholar 

  8. da Silva, J.F., Lopes, G.P.: A local maxima method and a fair dispersion normalization for extracting multi-words units from corpora. In: Sixth Meeting on Mathematics of Language, Orlando, USA, pp. 369–381 (1999)

    Google Scholar 

  9. Giuliano, V.E.: The interpretation of word associations. In: Stevens, M.E., et al. (eds.) Proceedings of the Symposiums on Statistical Association Methods for Mechanical Documentation, Washington D.C.,  number 269, NBS (1964)

    Google Scholar 

  10. Hamming, R.: Error-detecting and error-correcting codes. Bell System Technical Journal 29(2), 147–160 (1950)

    MathSciNet  Google Scholar 

  11. Justeson, J., Katz, S.: Co-occurrences of antonymous adjectives and their contexts. Computational Linguistics 17(1), 1–20 (1991)

    Google Scholar 

  12. Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing, 1st edn. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  13. Milgram, S.: The small-world problem. Psychology Today 2, 60–67 (1967)

    Google Scholar 

  14. Ngonga Ngomo, A.-C.: CLIque-based clustering. In: Proceedings of Knowledge Sharing and Collaborative Engineering Conference, St. Thomas, VI, USA (November 2006)

    Google Scholar 

  15. Ngonga Ngomo, A.-C.: Knowledge-free discovery of domain-specific multi-word units. In: Proceedings of the 2008 ACM symposium on Applied computing, ACM, New York (to appear, 2008)

    Google Scholar 

  16. Robertson, S.E., Hull, D.: The TREC 2001 filtering track report. In: Text REtrieval Conference (2001)

    Google Scholar 

  17. Schone, P.: Toward Knowledge-Free Induction of Machine-Readable Dictionaries. PhD thesis, University of Colorado at Boulder, Boulder, USA (2001)

    Google Scholar 

  18. Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–123 (1998)

    Google Scholar 

  19. Smadja, F.A.: Retrieving collocations from text: Xtract. Computational Linguistics 19(1), 143–177 (1993)

    Google Scholar 

  20. Steyvers, M., Tenenbaum, J.: The large-scale structure of semantic networks: Statistical analyses and a model of semantic growth. Cognitive Science: A Multidisciplinary Journal 29(1), 41–78 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ngonga Ngomo, AC. (2008). SIGNUM: A Graph Algorithm for Terminology Extraction. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78135-6_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78134-9

  • Online ISBN: 978-3-540-78135-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics