RELFIN – Topic Discovery for Ontology Enhancement and Annotation

  • Markus Schaal
  • Roland M Müller
  • Marko Brunzel
  • Myra Spiliopoulou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3532)


While classic information retrieval methods return whole documents as a result of a query, many information demands would be better satisfied by fine-grain access inside the documents. One way to support this goal is to make the semantics of small document regions explicit, e.g. as XML labels, so that query engines can exploit them. To this purpose, the topics of the small document regions must be discovered from the texts; differently from document labelling applications, fine-grain topics cannot be listed in advance for arbitrary collections. Text-understanding approaches can derive the topic of a document region but are less appropriate for the construction of a small set of topics that can be used in queries.

To address this challenge we propose the coupling of text mining, prior knowledge explicated in ontologies and human expertise and present the system RELFIN, which is designed to assis the human expert in the discovery of topics appropriate for (i) ontology enhancement with additional concepts or relationships, (ii) semantic characterization and tagging of document regions. RELFIN performs data mining upon linguistically preprocessed corpora to group document regions on topics and constructing the topic labels for them, so that the labels are characteristic of the regions and thus helpful in ontology-based search. We show our first results of applying RELFIN on a case study of text analysis and retrieval.


Topic Discovery Label Construction Ontology Enhancement Text Clustering 


  1. [FN99]
    Faure, D., Nédellec, C.: Knowledge acquisition of predicate argument structures from technical texts using machine learning: the system ASIUM. In: Fensel, D., Studer, R. (eds.) EKAW 1999. LNCS (LNAI), vol. 1621, pp. 329–334. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  2. [GSW01]
    Graubitz, H., Spiliopoulou, M., Winkler, K.: The DIAsDEM framework for converting domain-specific texts into XML documents with data mining techniques. In: Proc. of the 1st IEEE Intl. Conf. on Data Mining, San Jose, CA, pp. 171–178. IEEE, Los Alamitos (2001)CrossRefGoogle Scholar
  3. [HSS03]
    Hotho, A., Staab, S., Stumme, G.: Explaining text clustering results using semantic structures. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 217–228. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. [HSV03]
    Handschuh, S., Staab, S., Volz, R.: On deep annotation. In: Proceedings of the Twelfth International Conference on World Wide Web, Budapest, Hungary, pp. 431–438. ACM Press, New York (2003)CrossRefGoogle Scholar
  5. [KVM00]
    Kietz, J.-U., Volz, R., Maedche, A.: Extracting a domain-specific ontology from a corporate intranet. In: Cardie, C., Daelemans, W., Nédellec, C., Sang, E.T.K. (eds.) Proc. of 4th Conf. on Computational Natural Language Learning and of the 2nd Learning Language in Logic Workshop, Somerset, New Jersey, pp. 167–175. Association for Computational Linguistics (2000)Google Scholar
  6. [MB01]
    William Moore, G., Berman, J.J.: Medical data mining and knowledge discovery. In: Anatomic Pathology Data Mining. Studies in Fuzziness and Soft Computing, vol. 60, pp. 72–117. Physica-Verlag, Heidelberg (2001)Google Scholar
  7. [MS00a]
    Maedche, A., Staab, S.: Discovering conceptual relations from text. In: Proc. of ECAI 2000, pp. 321–325 (2000)Google Scholar
  8. [MS00b]
    Maedche, A., Staab, S.: Mining ontologies from text. In: Dieng, R., Corby, O. (eds.) EKAW 2000. LNCS (LNAI), vol. 1937, pp. 189–202. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  9. [MS00c]
    Maedche, A., Staab, S.: Semi-automatic engineering of ontologies from text. In: Proc. of 12th Int. Conf. on Software and Knowledge Engineering, Chicago, IL (2000)Google Scholar
  10. [RDH+03a]
    Rinaldi, F., Dowdall, J., Hess, M., Ellman, J., Zarri, G.P., Persidis, A., Bernard, L., Karanikas, H.: Multilayer annotations in parmenides. In: Proceedings of the K-CAP 2003 workshop on Knowledge Markup and Semantic Annotation (October 2003)Google Scholar
  11. [RDH+03b]
    Rinaldi, F., Dowdall, J., Hess, M., Kaljurand, K., Persidis, A., Theodoulidis, B., Black, B., McNaught, J., Karanikas, H., Vasilakopoulos, A., Zervanou, K., Bernard, L., Zarri, G.P., Slot, H.B., van der Touw, C., Daniel-King, M., Underwood, N., Lisowska, A., van der Plas, L., Sauron, V., Spiliopoulou, M., Brunzel, M., Ellman, J., Orphanos, G., Mavroudakis, T., Taraviras, S.: Parmenides: an opportunity for ISO TC37 SC4. In: ACL 2003 workshop on Linguistic Annotation, Sapporo, Japan (July 2003)Google Scholar
  12. [RM99]
    Rauber, A., Merkl, D.: Mining text archives: Creating readable maps to structure and describe document collections. In: Principles of Data Mining and Knowledge Discovery, pp. 524–529 (1999)Google Scholar
  13. [SB88]
    Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523 (1988)CrossRefGoogle Scholar
  14. [SKK00]
    Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining (2000)Google Scholar
  15. [SRB+04]
    Spiliopoulou, M., Rinaldi, F., Black, W.J., Zarri, G.P., Mueller, R.M., Brunzel, M., Theodoulidis, B., Orphanos, G., Hess, M., Dowdall, J., McNaught, J., King, M., Persidis, A., Bernard, L.: Coupling information extraction and data mining for ontology learning in parmenides. In: RIAO 2004, April 26th-28th, Avignon (2004)Google Scholar
  16. [VBB04]
    Vasilakopoulos, A., Bersani, M., Black, W.J.: A suite of tools for marking up textual data for temporal text mining scenarios. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon (2004)Google Scholar
  17. [WS01a]
    Winkler, K., Spiliopoulou, M.: Extraction of semantic XML DTDs from texts using data mining techniques. In: Proceedings of the K-CAP 2001 Workshop on Knowledge Markup and Semantic Annotation, Victoria, BC, Canada, pp. 59–68 (October 2001)Google Scholar
  18. [WS01b]
    Winkler, K., Spiliopoulou, M.: Semi-automated XML tagging of public text archives: A case study. In: Proceedings of EuroWeb 2001 The Web in Public Administration, Pisa, Italy, pp. 271–285 (December 2001)Google Scholar
  19. [WS02]
    Winkler, K., Spiliopoulou, M.: Structuring domain-specific text archives by deriving a probabilistic XML DTD. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 461–474. Springer, Heidelberg (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Markus Schaal
    • 1
  • Roland M Müller
    • 1
  • Marko Brunzel
    • 1
  • Myra Spiliopoulou
    • 1
  1. 1.Otto-von-Guericke-UniversityMagdeburg

Personalised recommendations