Semantic Views of Homogeneous Unstructured Data

Adrian, Weronika T.; Leone, Nicola; Manna, Marco

doi:10.1007/978-3-319-22002-4_3

Weronika T. Adrian^15,16,
Nicola Leone¹⁵ &
Marco Manna¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9209))

Included in the following conference series:

International Conference on Web Reasoning and Rule Systems

433 Accesses
2 Citations

Abstract

Homogeneous unstructured data (HUD) are collections of unstructured documents that share common properties, such as similar layout, common file format, or common domain of values. Building on such properties, it would be desirable to automatically process HUD to access the main information through a semantic layer – typically an ontology – called semantic view. Hence, we propose an ontology-based approach for extracting semantically rich information from HUD, by integrating and extending recent technologies and results from the fields of classical information extraction, table recognition, ontologies, text annotation, and logic programming. Moreover, we design and implement a system, named KnowRex, that has been successfully applied to curriculum vitae in the Europass style to offer a semantic view of them, and be able, for example, to select those which exhibit required skills.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anantharangachar, R., Ramani, S., Rajagopalan, S.: Ontology guided information extraction from unstructured text. CoRR abs/1302.1335 (2013)
Google Scholar
Balke, W.T.: Introduction to information extraction: basic notions and current trends. Datenbank-Spektrum 12(2), 81–88 (2012)
Article Google Scholar
Chang, C.H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A survey of web information extraction systems. IEEE Trans. Kn. Data Eng. 18(10), 1411–1428 (2006)
Article Google Scholar
Chen, L., Ortona, S., Orsi, G., Benedikt, M.: Aggregating semantic annotators. In: Proceedings VLDB Endow, vol. 6 no. 13, pp. 1486–1497 (2013)
Google Scholar
Furche, Tim, Gottlob, Georg, Grasso, Giovanni, Orsi, Giorgio, Schallhart, Christian, Wang, Cheng: Little knowledge rules the web: domain-centric result page extraction. In: Rudolph, Sebastian, Gutierrez, Claudio (eds.) RR 2011. LNCS, vol. 6902, pp. 61–76. Springer, Heidelberg (2011)
Chapter Google Scholar
Jiang, J.: Information extraction from text. In: Aggarwal, C.C., Zhai, C.X. (eds.) Mining Text Data, pp. 11–41. Springer, US (2012)
Chapter Google Scholar
Kara, S., Alan, O., Sabuncu, O., Akpinar, S., Cicekli, N.K., Alpaslan, F.N.: An ontology-based retrieval system using semantic indexing. Inf. Syst. 37(4), 294–305 (2012)
Article Google Scholar
Karkaletsis, Vangelis, Fragkou, Pavlina, Petasis, Georgios, Iosif, Elias: Ontology based information extraction from text. In: Paliouras, Georgios, Spyropoulos, Constantine D., Tsatsaronis, George (eds.) Multimedia Information Extraction. LNCS, vol. 6050, pp. 89–109. Springer, Heidelberg (2011)
Chapter Google Scholar
Manna, M., Oro, E., Ruffolo, M., Alviano, M., Leone, N.: The H\(\imath \)L\(\varepsilon \)X system for semantic information extraction. Trans. Large-Scale Data- Knowl.-Centered Syst. V 7100, 91–125 (2012)
Article Google Scholar
Mo, Qian, Chen, Yi-hong: Ontology-Based Web Information Extraction. In: Zhao, Maotai, Sha, Junpin (eds.) ICCIP 2012, Part I. CCIS, vol. 288, pp. 118–126. Springer, Heidelberg (2012)
Chapter Google Scholar
Ricca, F., Leone, N.: Disjunctive logic programming with types and objects: The DLV\(^{+}\) system. J. Appl. Logic 5(3), 545–573 (2007)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The work has been supported by Regione Calabria, programme POR Calabria FESR 2007–2013, within project “KnowRex: Un sistema per il riconoscimento e l’estrazione di conoscenza”.

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Calabria, Cosenza, Italy
Weronika T. Adrian, Nicola Leone & Marco Manna
AGH University of Science and Technology, Al.A.Mickiewicza 30, Krakow, Poland
Weronika T. Adrian

Authors

Weronika T. Adrian
View author publications
You can also search for this author in PubMed Google Scholar
Nicola Leone
View author publications
You can also search for this author in PubMed Google Scholar
Marco Manna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Manna .

Editor information

Editors and Affiliations

University of California, Santa Cruz, USA
Balder ten Cate
National University of Ireland, Galway, Ireland
Alessandra Mileo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adrian, W.T., Leone, N., Manna, M. (2015). Semantic Views of Homogeneous Unstructured Data. In: ten Cate, B., Mileo, A. (eds) Web Reasoning and Rule Systems. RR 2015. Lecture Notes in Computer Science(), vol 9209. Springer, Cham. https://doi.org/10.1007/978-3-319-22002-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-22002-4_3
Published: 22 July 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22001-7
Online ISBN: 978-3-319-22002-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics