Abstract
An increasing amount of information is available on the web and usually is expressed as text. Semantic information is implicit in these texts, since they are mainly intended for human consumption and interpretation. Because unstructured information is not easily handled automatically, an information extraction process has to be used to identify concepts and establish relations among them. Ontologies are an appropriate way to represent structured knowledge bases, enabling sharing, reuse and inference. In this paper, an information extraction process is used for populating a domain ontology. It targets Brazilian Portuguese texts from a biographical dictionary of music, which requires specific tools due to some language unique aspects. An unsupervised rule-based method is proposed. Through this process, latent concepts and relations expressed in natural language can be extracted and represented as an ontology, allowing new uses and visualizations of the content, such as semantically browsing and inferring new knowledge.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web. Morgan Kaufman, San Francisco (2000)
Ahn, D., Van Rantwijk, J., De Rijke, M.: A Cascaded Machine Learning Approach to Interpreting Temporal Expressions. In: Proceedings of NAACL HLT 2007, Rochester, NY, pp. 420–427 (2007)
Albin, R.: Dicionário Cravo Albin da Música Popular Brasileira, http://www.dicionariompb.com.br
Allen, J.: Time and Time Again - The Many Ways to Represent Time. International Journal of Intelligent Systems 6 (1991)
Branco, A., Silva, J.: A Suite of Shallow Processing Tools for Portuguese: LX-Suite. In: Proceedings of 11th Conference of the European Chapter of Association for Computational Linguistics, pp. 179–182 (2006)
Cardoso, J.: The Semantic Web Vision: Where are We. IEEE Intelligent Systems, 22–26 (September/October 2007)
Chang, C., Kayed, M., Girgis, M., Shaalan, K.: A Survey of Web Information Extraction Systems. IEEE Transaction on Knowledge and Data Engineering 18(10), 1411–1428 (2006)
Chaves, A., Rino, L.: The Mitkov Algorithm for Anaphora Resolution in Portuguese. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds.) PROPOR 2008. LNCS (LNAI), vol. 5190, pp. 51–60. Springer, Heidelberg (2008)
Cimiano, P., Völker, J.: Towards large-scale open-domain and ontology-based named entity classification. In: Proceedings of RANLP 2005, Borovets, Bulgaria, pp. 166–172 (2005)
CliqueMusic, http://cliquemusic.uol.com.br
Feldman, R., Sanger, J.: The Text Mining Handbook - Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2007)
Giasson, F., Raimond, Y.: Music Ontology Specification (2008), http://musicontology.com
Graça, J., Mamede, N., Pereira, J.: A framework for Integrating Natural Language Tools. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 110–119. Springer, Heidelberg (2006)
Gruber, T.: Ontology. In: Liu, L., Tamer Özsu, M. (eds.) Encyclopedia of Database Systems. Springer, Heidelberg (2008)
Haarslev, V., Möller, R.: Racer: An OWL Reasoning Agent for the Semantic Web. In: Proceedings of the International Workshop on Applications, Products and Services of Web-based Support Systems, in conjunction with 2003 IEEE/WIC International Conference on Web Intelligence, Halifax Canada, October 13, pp. 91–95 (2003)
Haase, P., Völker, J.: Ontology learning and reasoning - dealing with uncertainty and inconsistency. In: da Costa, P.C.G., d’Amato, C., Fanizzi, N., Laskey, K.B., Laskey, K.J., Lukasiewicz, T., Nickles, M., Pool, M. (eds.) URSW 2005 - 2007. LNCS (LNAI), vol. 5327, pp. 366–384. Springer, Heidelberg (2008)
Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th International Conference on Computational Linguistics (COLING), pp. 539–545 (1992)
Kaiser, K., Miksch, S.: Information Extraction - A Survey. Technical Report Asgaard-TR-2005-6, Vienna University of Technology, Vienna, Austria (2005)
Knublauch, H.: Protégé-OWL API Programmer’s Guide (2006), http://protege.stanford.edu/plugins/owl/api/guide.html
Mani, I., Wilson, G.: Temporal Granularity and Temporal Tagging of Text. In: AAAI 2000 Workshop on Spatial and Temporal Granularity, Austin, TX (2000)
Moens, M.-F.: Information Extraction: Algorithms and Prospects in a Retrieval Context. Springer, Heidelberg (2006)
Muniz, M., Nunes, M., Laporte, E.: UNITEX-PB, a set of flexible language resources for Brazilian Portuguese. In: Proceedings of the Workshop on Technology on Information and Human Language (TIL), São Leopoldo, Brazil (2005)
Protégé, http://protege.stanford.edu
Quan, D., Karger, D.: How to make a semantic web browser. In: Proceedings of the 13th international conference on World Wide Web (2004)
Tanev, H., Magnini, B.: Weakly Supervised Approaches for Ontology Population. In: Proceedings of 11th Conference of the European Chapter of the Association for Computational Linguistics: EACL 2006 (2006)
Yildiz, B., Miksch, S.: Motivating Ontology-Driven Information Extraction. In: Proceedings of the International Conference on Semantic Web and Digital Libraries, ICSD 2007 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Motta, E., Siqueira, S., Andreatta, A. (2010). An Unsupervised Rule-Based Method to Populate Ontologies from Text. In: Cordeiro, J., Filipe, J. (eds) Web Information Systems and Technologies. WEBIST 2009. Lecture Notes in Business Information Processing, vol 45. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12436-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-12436-5_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12435-8
Online ISBN: 978-3-642-12436-5
eBook Packages: Computer ScienceComputer Science (R0)