Abstract
Homogeneous unstructured data (HUD) are collections of unstructured documents that share common properties, such as similar layout, common file format, or common domain of values. Building on such properties, it would be desirable to automatically process HUD to access the main information through a semantic layer – typically an ontology – called semantic view. Hence, we propose an ontology-based approach for extracting semantically rich information from HUD, by integrating and extending recent technologies and results from the fields of classical information extraction, table recognition, ontologies, text annotation, and logic programming. Moreover, we design and implement a system, named KnowRex, that has been successfully applied to curriculum vitae in the Europass style to offer a semantic view of them, and be able, for example, to select those which exhibit required skills.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Anantharangachar, R., Ramani, S., Rajagopalan, S.: Ontology guided information extraction from unstructured text. CoRR abs/1302.1335 (2013)
Balke, W.T.: Introduction to information extraction: basic notions and current trends. Datenbank-Spektrum 12(2), 81–88 (2012)
Chang, C.H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A survey of web information extraction systems. IEEE Trans. Kn. Data Eng. 18(10), 1411–1428 (2006)
Chen, L., Ortona, S., Orsi, G., Benedikt, M.: Aggregating semantic annotators. In: Proceedings VLDB Endow, vol. 6 no. 13, pp. 1486–1497 (2013)
Furche, Tim, Gottlob, Georg, Grasso, Giovanni, Orsi, Giorgio, Schallhart, Christian, Wang, Cheng: Little knowledge rules the web: domain-centric result page extraction. In: Rudolph, Sebastian, Gutierrez, Claudio (eds.) RR 2011. LNCS, vol. 6902, pp. 61–76. Springer, Heidelberg (2011)
Jiang, J.: Information extraction from text. In: Aggarwal, C.C., Zhai, C.X. (eds.) Mining Text Data, pp. 11–41. Springer, US (2012)
Kara, S., Alan, O., Sabuncu, O., Akpinar, S., Cicekli, N.K., Alpaslan, F.N.: An ontology-based retrieval system using semantic indexing. Inf. Syst. 37(4), 294–305 (2012)
Karkaletsis, Vangelis, Fragkou, Pavlina, Petasis, Georgios, Iosif, Elias: Ontology based information extraction from text. In: Paliouras, Georgios, Spyropoulos, Constantine D., Tsatsaronis, George (eds.) Multimedia Information Extraction. LNCS, vol. 6050, pp. 89–109. Springer, Heidelberg (2011)
Manna, M., Oro, E., Ruffolo, M., Alviano, M., Leone, N.: The H\(\imath \)L\(\varepsilon \)X system for semantic information extraction. Trans. Large-Scale Data- Knowl.-Centered Syst. V 7100, 91–125 (2012)
Mo, Qian, Chen, Yi-hong: Ontology-Based Web Information Extraction. In: Zhao, Maotai, Sha, Junpin (eds.) ICCIP 2012, Part I. CCIS, vol. 288, pp. 118–126. Springer, Heidelberg (2012)
Ricca, F., Leone, N.: Disjunctive logic programming with types and objects: The DLV\(^{+}\) system. J. Appl. Logic 5(3), 545–573 (2007)
Acknowledgements
The work has been supported by Regione Calabria, programme POR Calabria FESR 2007–2013, within project “KnowRex: Un sistema per il riconoscimento e l’estrazione di conoscenza”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Adrian, W.T., Leone, N., Manna, M. (2015). Semantic Views of Homogeneous Unstructured Data. In: ten Cate, B., Mileo, A. (eds) Web Reasoning and Rule Systems. RR 2015. Lecture Notes in Computer Science(), vol 9209. Springer, Cham. https://doi.org/10.1007/978-3-319-22002-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-22002-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22001-7
Online ISBN: 978-3-319-22002-4
eBook Packages: Computer ScienceComputer Science (R0)