Abstract
Corpus linguistic methods are discussed in the context of the automatic extraction of a candidate terminology of a specialist domain of knowledge. Collocation analysis of the candidate terms leads to some insight into the ontological commitment of the domain community or collective. The candidate terminology and ontology can be easily verified and validated and subsequently may be used in the construction of information extraction systems and of knowledge-based systems. The use of the methods is illustrated by an investigation of the ontological commitment of four major collectives: nuclear physics, cell biology, linguistics and anthropology. An analysis of a diachronic corpus allows an insight into changes in basic concepts within a specialism; an analysis of a corpus comprising texts published during a short and fixed time period –a synchronic corpus- shows how different sub-specialisms within a collective commit themselves to an ontology.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ahmad, K.: Neologisms, Nonces and Word Formation. In: Heid, U., Evert, S., Lehmann, E., Rohrer, C. (eds.) Proceedings of the 9th EURALEX Int. Congress, Munich, 8-12 August 2000, vol. II, pp. 711–730. Universitat Stuttgart, Munich (2000)
Ahmad, K.: Writing Linguistics: When I use a word it means what I choose it to mean (Invited Talk). In: Klenner, M., Visser, H. (eds.) Computational Linguistics for the New Millennium: Divergence or Synergy? Proceedings of the International Symposium held at the Ruprecht-Karls-Universität, Heidelberg, 21-22 July 2000, pp. 15–38. Peter Lang Publishing Group, Bern (2002)
Ahmad, K., Gillam, L.: Automatic Ontology Extraction from Unstructured Texts. In: Meersman, R., Tari, Z. (eds.) On the Move to Meaningful Internet Systems 2005: CoopIS, DOA, and ODBASE. LNCS, vol. 3761, pp. 1330–1346. Springer, Heidelberg (2005)
Ahmad, K., Miles, L.: Specialist Knowledge and its Management. Journal of Hydroinformatics 24(4), 215–230 (2001)
Ahmad, K., Musacchio, M.T.: Enrico Fermi and the making of the language of nuclear physics. Fachsprache 25(3-4), 120–140 (2003)
Ahmad, K., Rogers, M.: Corpus Linguistics and Terminology Extraction. In: Wright, S.-E., Budin, G. (eds.) Handbook of Terminology Management, vol. 2, pp. 725–760. John Benjamins Publishing Company, Amsterdam & Philadelphia (2001)
Armstrong, S.: Using Large Corpora. The MIT Press, Cambridge & London (1993)
Aston, G., Burnard, L.: The BNC Handbook: Exploring the British National Corpus with Sara. Edinburgh Univ. Press, Edinburgh (1998)
Church, K.W., Mercer, R.L.: ‘Introduction [...]’. In: Armstrong, S. (ed.), pp. 1–24 (1994)
Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S.: Learning to construct knowledge bases from the World Wide Web. Artificial Intelligence 118, 69–113 (2000)
Crystal, D.: A Dictionary of Linguistics and Phonetics. Blackwell Publishers, Oxford (2002)
Dameron, O., Musen, M.A., Gibaud, B.: Using semantic dependencies for consistency management of an ontology of brain—cortex anatomy. Artificial Intelligence in Medicine 39, 217–225 (2007) (accessed June 20, 2007), available at: http://www.intl.elsevierhealth.com/journals/aiim
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. In: Armstrong, S. (ed.), pp. 61–74 (1994)
Gerr, S.: Language and Science the Rational, Functional Language of Science and Technology. Philosophy of Science 9(2), 146–161 (1942)
Gillam, L., Tariq, M., Ahmad, K.: Terminology and the Construction of Ontology. Terminology 11(1), 55–81 (2005)
Guarino, N., Welty, C.: Evaluating ontological decisions with Ontoclean. Comms. of the ACM 45(2), 61–65 (2002)
Hacking, I.: Aristotelian Categories and Cognitive Domains. Synthese 126, 473–515 (2001)
Halliday, M.A.K., Martin, J.R.: Writing Science: Literacy and Discursive Power. The Falmer Press, London & Washington (1993)
Harris, R.A.: The Linguistic Wars. Oxford University Press, NewYork (1993)
Harris, Z.: A Theory of Language and Information: A Mathematical Approach. Clarendon Press, Oxford (1991)
Hayes, D.: The growing inaccessibility of science. Nature. 356, 739–740 (1992)
(accessed July 5, 2007), http://en.wikipedia.org/wiki/Cytoskeleton
Ilic, K., Kellogg, E.A., Jaiswal, P., Zapata, F., Stevens, P.F., Vincent Leszek, P., Avraham, S., Reiser, L., Pujar, A., Sachs, M.M., Whitman, N.T., McCouch Susan, R., Schaeffer, M.L., Ware, D.H., Stein, L.D., Rhee Seung, Y.: The Plant Structure Ontology, a Unified Vocabulary of Anatomy and Morphology of a Flowering Plant. Plant Physiology 143, 587–599 (2007)
Illingworth, V. (ed.): Oxford Dictionary of Computing. Oxford Univ. Press, Oxford (1996)
Issacs, A.: A Dictionary of Physics. Oxford University Press, Oxford (2003)
Lee, C.-S., Kao, Y.-F., Kuo, Y.-H., Wang, M.-H.: Automated ontology construction for unstructured text documents. Data & Knowledge Engineering 60(3), 547–566 (2007)
Oxford English Dictionary – The Online Version, available at http://www.oed.co.uk
van Orman Quine, W.: Word and Object. The MIT Press, Cambridge (1960)
Quirk, R.: Grammatical and Lexical Variance in English. Addison Wesley Longman, Harlow (1995)
Quirk, R., Greenbaum, S., Leech, G., Svartvik, J.: A Comprehensive Grammar of the English Language. Longman, London,New York (1985)
Serban, R., ten Teije, A., van Harmelen, F., Marcos, M., Polo-Conde, C.: Extraction and use of linguistic patterns for modelling medical guidelines. Artificial Intelligence in Medicine 39, 137–149 (2007)
Sinclair, J.M.: Collocation: a progress report. In: Steele, R., Threadgold, T. (eds.) Language Topics: essays in Honour of Michael Halliday, vol. 3, pp. 319–331. John Benjamins Pub. Co., Amsterdam (1987)
Sleeman, D., Reul, Q.: CleanONTO: Evaluating Taxonomic Relationships inOntologies. In: Proceedings WWW 2006, Edinburgh, UK, May 22–26 (2006)
Smadja, F.: Retrieving collocations from text: Xtract. In: Armstrong, S. (ed.), pp. 143–177 (1994)
Smith, B.: An Essay in Formal Ontology. Grazer Philosophische Studien 6, 39–62 (1978)
Smith, B.: Ontology. In: Floridi, L. (ed.) Blackwell Guide to the Philosophy of Computingand Information, Basil Blackwell, Oxford (2003)
Smith, B.: Beyond Concepts: Ontology as Reality Representation. In: Varz, A., Vieu, L. (eds.) Proceedings of FOIS 2004. International Conference on Formal Ontology and Information Systems, Turin, 4-6 November 2004 (2004)
Teubert, W.: Writing, hermeneutics and corpus linguistics. Logos and Language IV(2), 1–17 (2003)
Visual Being – A weblog for presentational technologies – (accessed July 8, 2007), www.visualbeing.com/2005/03
Wikipedia. (accessed July 5, 2007), http://en.wikipedia.org/wiki/Cytoskeleton
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ahmad, K. (2007). Artificial Ontologies and Real Thoughts: Populating the Semantic Web?. In: Basili, R., Pazienza, M.T. (eds) AI*IA 2007: Artificial Intelligence and Human-Oriented Computing. AI*IA 2007. Lecture Notes in Computer Science(), vol 4733. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74782-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-74782-6_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74781-9
Online ISBN: 978-3-540-74782-6
eBook Packages: Computer ScienceComputer Science (R0)