Kizomba: An Unsupervised Heuristic-Based Web Information Extractor

Roldán, Juan C.

doi:10.1007/978-3-319-40159-1_35

Juan C. Roldán¹³

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 473))

Included in the following conference series:

International Conference on Practical Applications of Agents and Multi-Agent Systems

882 Accesses
1 Citations

Abstract

The Web is an ever growing repository of valuable information. That information lacks semantics since it is buried into web documents that are represented using HTML. Information extractors are software components that help software engineers in the task of extracting structured information from web documents.

This work was supported by the European Commission (FEDER) and the Spanish R&D&I programme by means of grant TIN2013-40848-R.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Crescenzi, V., Mecca, G.: Automatic information extraction from large websites. J. ACM, 731–779 (2004)
Google Scholar
Kayed, M., Chang, C.H.: Fivatech: Page-level web data extraction from template pages. IEEE Trans. on Knowl. and Data Eng., 249–263 (2010)
Google Scholar
Ma, L., Goharian, N., Chowdhury, A., Chung, M.: Extracting unstructured data from template generated web documents. In: CIKM, pp. 512–515 (2003)
Google Scholar
Sleiman, H., Corchuelo, R.: Trinity: On using trinary trees for unsupervised web data extraction. IEEE Trans. on Knowl. and Data Eng., 1544–1556 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

ETSI Informática, University of Sevilla, Avda. Reina Mercedes, s/n, 41012, Sevilla, Spain
Juan C. Roldán

Authors

Juan C. Roldán
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan C. Roldán .

Editor information

Editors and Affiliations

Dept de Infor y Auto Facultad de Ciencia, University of Salamanca, Salamanca, Spain
Fernando de la Prieta
ETS Ingeniería Informática, University of Sevilla, Sevilla, Spain
María J. Escalona
ETSI Informática, Universidad de Sevilla, Sevilla, Spain
Rafael Corchuelo
and Technology, Lille University of Science, Villeneuve d'Ascq Cédex, France
Philippe Mathieu
GECAD, Porto, Portugal
Zita Vale
School of Computer Science, Dartmouth College, Hanover, USA
Andrew T. Campbell
Dept of Electrical Engineering and IT, University of Naples Federico II, Naples, Italy
Silvia Rossi
LAMIH (UMR CNRS 8530), Universite de Valenciennes, Valenciennes, France
Emmanuel Adam
Campus Catalunya, Universitat Rovira i Virgili, Tarragona, Spain
María D. Jiménez-López
Departamento de Sistemas Informáticos, University of Castilla-La Mancha, Albacete, Spain
Elena M. Navarro
Departamento de Informática y Automática, University of Salamanca, Salamanca, Salamanca, Spain
María N. Moreno

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Roldán, J.C. (2016). Kizomba: An Unsupervised Heuristic-Based Web Information Extractor. In: de la Prieta, F., et al. Trends in Practical Applications of Scalable Multi-Agent Systems, the PAAMS Collection. PAAMS 2016. Advances in Intelligent Systems and Computing, vol 473. Springer, Cham. https://doi.org/10.1007/978-3-319-40159-1_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-40159-1_35
Published: 07 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40158-4
Online ISBN: 978-3-319-40159-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics