Document Identification by Shallow Semantic Analysis

Bouchachia, Abdelhamid; Mittermeir, Roland T.; Pozewaunig, Heinz

doi:10.1007/3-540-45399-7_16

Document Identification by Shallow Semantic Analysis

Abdelhamid Bouchachia⁶,
Roland T. Mittermeir⁶ &
Heinz Pozewaunig⁶

Conference paper
First Online: 01 January 2001

4464 Accesses
3 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1959))

Abstract

Identifying a matching component is a recurring problem in software engineering, specifically in software reuse. Properly generalized, it can be seen as an information retrieval problem. In the context of defining the architecture of a comprehensive software archive, we are designing a two-level retrieval structure. In this paper we report on the first level, a quick search facility based on analyzing texts written in natural language. Based on textual and structural properties of the documents contained in the repository, the universe is reduced to a moderately sized set of candidates to be further analyzed by more focussed mechanisms.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

W.B. Frakes. Information Retrieval: Data Structures and Algorithms, pages 1–12. Prentice Hall, 1992.
Google Scholar
P.H. Fries. Advances in Written Text Analysis, pages 229–249. Routledge, 1994.
Google Scholar
E.J. Guglielmo and N.C. Rowe. Natural Language Retrieval of Images based on Descriptive Captions. ACM Trans. on Information Systems, 14(3), July 1996.
Google Scholar
U. Hahn. Topic Parsing: Accounting for Text Macro Structures in Full-Text Analysis. Information Processing and Management, 26(1):135–170, 1990.
Article Google Scholar
M.A.K. Halliday. An Introduction to Functional Grammar. Edward Arnold, 1985.
Google Scholar
M.A.K. Halliday and R. Hasan. Cohesion in English. Addison Wesley Ltd, 1976.
Google Scholar
I. Jacobson, M. Griss, and P. Jonsson. Software Reuse. Addison-Wesley, 1997.
Google Scholar
R.J. Leach. Software Reuse. McGraw Hill, 1997.
Google Scholar
A. Mili, R. Mili, and R.T. Mittermeir. A Survey of Software Reuse Libraries. Annals of Software Engineering-Systematic Software Reuse, 5:349–414, 1998.
Google Scholar
H. Mili, E. Akhi, R. Godin, and H. Mcheik. Another Nail to the Coffin of Faceted Controlled-Vocabulary Component Classification and Retrieval. In M.Harandi, Symposium on Software Reusability, vol. 22, pp 89–98. ACM Press, 1997.
Article Google Scholar
R. Mili, A. Mili, and R.T. Mittermeir. Storing and Retrieving Software Components: A Refinement Based System. IEEE Tran. on Software Engineering, 23(7):445–460, July 1997.
Article Google Scholar
M. Mitra, A. Singhal, and C. Buckley. Improving Automatic Query Expansion. In Proc. of the 21st Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp 206–214, Melbourne, August 24-28 1998.
Google Scholar
R.T. Mittermeir, H. Pozewaunig, A. Mili, and R. Mili. Uncertainty Aspects in Component Retrieval. In Proc. of the 7th Int. Conf. on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Paris, July 1998.
Google Scholar
J. Morris and G. Hirst. Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text. Ass. for Comp. Linguistics, 17(1), March 1991.
Google Scholar
M. Nystrand. The Structure of Written Communication. Academic Press, 1986.
Google Scholar
[16] Rubèn Prieto-Diàz. Implementing Faceted Classification for Software Reuse. Communications of the ACM, 43(5):88–97, May 1991.
Article Google Scholar
F. Daneš. Functional Sentence Perspective and the Organization of the Text. In Papers on Functional Sentence Perspective, pages 106–128. Publishing House of The Czechoslovak Academy of Sciences, Prague, 1970.
Google Scholar
T.A. vanDijk. Handbook of Discourse Analysis: Dimensions of Discourse, vol. 2, pp 103–134. Academic Press, 1985.
Google Scholar
Y. Yang and J.P. Pedersen. A Comparative Study on Feature Selection in Text Categorization. In Proc. of the 14th Int. Conf. on Machine learning, 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Informatik-Systeme, Universität Klagenfurt, Universitätsstrasse 65-67, A-9020, Klagenfurt, Austria
Abdelhamid Bouchachia, Roland T. Mittermeir & Heinz Pozewaunig

Authors

Abdelhamid Bouchachia
View author publications
You can also search for this author in PubMed Google Scholar
Roland T. Mittermeir
View author publications
You can also search for this author in PubMed Google Scholar
Heinz Pozewaunig
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

PRiSM Laboratory, University of Versailles, 45 av.des Etats-Unis, 78035, Paris, France
Mokrane Bouzeghoub & Zoubida Kedad &
CNAM, CEDRIC Laboratory, 292 rue Saint-Martin, 75003, Paris
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bouchachia, A., Mittermeir, R.T., Pozewaunig, H. (2001). Document Identification by Shallow Semantic Analysis. In: Bouzeghoub, M., Kedad, Z., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2000. Lecture Notes in Computer Science, vol 1959. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45399-7_16

Download citation

DOI: https://doi.org/10.1007/3-540-45399-7_16
Published: 11 May 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41943-3
Online ISBN: 978-3-540-45399-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics