Abstract
Open Information Extraction (OIE) aims to automatically identify all the possible assertions within a sentence. Results of this task are usually a set of triples (subject, predicate, object). In this paper, we first present what OIE is and how it can be improved when we work in a given domain of knowledge. Using a corpus made up of sentences in building engineering construction, we obtain an improvement of more than 18 %. Next, we show how OIE can be used at a base of a high-level semantic web task. Here we have applied OIE on formalisation of natural language definitions. We test this formalisation task on a corpus of sentences defining concepts found in the pizza ontology. At this stage, 70.27 % of our 37 sentences-corpus are fully rewritten in OWL DL.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
An exhaustive list of labels for phrases is available in the Penn Treebank [6].
- 3.
- 4.
With a large ontology, such comparison must take advantage of an index for the sake of scalability.
- 5.
- 6.
\(r_i\) is the subsumption or the set of elements of a more complex restriction (URI of the restriction property, OWL keywords for the type of the restriction, etc.) as explained in the introduction of Sect. 4.3.
- 7.
Only for better understanding. The choice of or would not have changed anything.
- 8.
- 9.
- 10.
Concepts’ tokens are usually surrounded by adjectives, adverbs, prepositions, etc.
References
OWL Web Ontology Language Guide, February 2004. http://www.w3.org/TR/owl-guide/
American with Disabilities Act (ADA): 2010 ADA Standards for Accessible Design, September 2010. http://www.fire.tas.gov.au/userfiles/stuartp/file/Publications/FireSafetyInBuildings.pdf
Bast, H., Haussmann, E.: Open information extraction via contextual sentence decomposition. In: 2013 IEEE Seventh International Conference on Semantic Computing (ICSC), pp. 154–159. IEEE Computer Society (2013)
Bast, H., Haussmann, E.: More informative open information extraction via simple inference. In: de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C.X., de Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 585–590. Springer, Heidelberg (2014)
Berg, J.: Aristotle’s theory of definition. In: ATTI del Convegno Internazionale di Storia della Logica, pp. 19–30 (1982)
Bies, A., Ferguson, M., Katz, K., MacIntyre, R., Tredinnick, V., Kim, G., Marcinkiewicz, M.A., Schasberger, B.: Bracketing guidelines for treebank II Style Penn Treebank project. University of Pennsylvania 97 (1995)
Bühmann, L., Fleischhacker, D., Lehmann, J., Melo, A., Völker, J.: Inductive lexical learning of class expressions. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds.) EKAW 2014. LNCS, vol. 8876, pp. 42–53. Springer, Heidelberg (2014)
Building Safety Unit Tasmania Fire Service: Fire Safety in Buildings, obligaitions of owners and occupiers, August 2002. http://www.fire.tas.gov.au/userfiles/stuartp/file/Publications/FireSafetyInBuildings.pdf
California Energy Commission: 2008 Building Energy Efficiency Standards, for residential and nonresidential buildings (2008). http://www.energy.ca.gov/2008publications/CEC-400-2008-001/CEC-400-2008-001-CMF.PDF
Del Corro, L., Gemulla, R.: Clausie: clause-based open information extraction. In: Proceedings of the 22nd International Conference on World Wide Web, WWW 2013, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp. 355–366 (2013)
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 1535–1545 (2011)
Hadjieleftheriou, M., Srivastava, D.: Weighted set-based string similarity. IEEE Data Eng. Bull. 33(1), 25–36 (2010)
Horridge, M., Jupp, S., Moulton, G., Rector, A., Stevens, R., Wroe, C.: A Practical Guide To Building OWL Ontologies Using Protégé 4 and CO-ODE Tools Edition1.2. The University of Manchester, Manchester (2009)
Kacfah Emani, C.H., Ferreira Da Silva, C., B., Ghodous, P.: Improving open information extraction using domain knowledge. In: Surfacing the Deep and the Social Web (SDSW), Co-Located with The 13th ISWC, October 2014
Kacfah Emani, C.H., Ferreira Da Silva, C., Fis, B., Ghodous, P., Khosrowshahi, F.: Structural sentence decomposition via open information extraction. In: 18th International Conference Information Visualisation (IV2014), July 2014
Lehmann, J., Auer, S., Bühmann, L., Tramp, S.: Class expression learning for ontology engineering. Web Semant. Sci. Serv. Agents World Wide Web 9(1), 71–81 (2011)
Mausam, S.,M., Bart, R., Soderland, S., Etzioni, O.: Open language learning for information extraction. In: EMNLP-CoNLL, pp. 523–534. Association for Computational Linguistics (2012)
Nguyen, V.T., Sallaberry, C., Gaio, M.: Mesure de la similarité entre termes et labels de concepts ontologiques. In: Conférence en Recherche D’information et Applications, pp. 415–430 (2013)
Sayah, K.: Automated Norm Extraction from Legal Texts. Master’s thesis, Utrecht University, August 2004
Tsatsaronis, G., Petrova, A., Kissa, M., Ma, Y., Distel, F., Baader, F., Schroeder, M.: Learning formal definitions for biomedical concepts. In: OWLED (2013)
Unger, C., Bühmann, L., Lehmann, J., Ngonga Ngomo, A.C., Gerber, D., Cimiano, P.: Template-based question answering over RDF data. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012, pp. 639–648. ACM, New York (2012)
Unger, C., Cimiano, P.: Pythia: compositional meaning construction for ontology-based question answering on the semantic web. In: Muñoz, R., Montoyo, A., Métais, E. (eds.) NLDB 2011. LNCS, vol. 6716, pp. 153–160. Springer, Heidelberg (2011)
Völker, J., Hitzler, P., Cimiano, P.: Acquisition of OWL DL axioms from lexical resources. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 670–685. Springer, Heidelberg (2007)
Völker, J., Rudolph, S.: Lexico-logical acquisition of OWL DL axioms. In: Medina, R., Obiedkov, S. (eds.) ICFCA 2008. LNCS (LNAI), vol. 4933, pp. 62–77. Springer, Heidelberg (2008)
Wächter, T., Schroeder, M.: Semi-automated ontology generation within obo-edit. Bioinformatics 26(12), i88–i96 (2010)
Winkler, W.E.: The state of record linkage and current research problems. Technical report, Statistical Research Division, U.S. Census Bureau (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Emani, C.K., Da Silva, C.F., Fiès, B., Ghodous, P. (2016). Improving Open Information Extraction for Semantic Web Tasks. In: Nguyen, N.T., Kowalczyk, R., Rupino da Cunha, P. (eds) Transactions on Computational Collective Intelligence XXI. Lecture Notes in Computer Science(), vol 9630. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49521-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-662-49521-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49520-9
Online ISBN: 978-3-662-49521-6
eBook Packages: Computer ScienceComputer Science (R0)