Abstract
The Web is an ever growing repository of valuable information. That information lacks semantics since it is buried into web documents that are represented using HTML. Information extractors are software components that help software engineers in the task of extracting structured information from web documents.
This work was supported by the European Commission (FEDER) and the Spanish R&D&I programme by means of grant TIN2013-40848-R.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Crescenzi, V., Mecca, G.: Automatic information extraction from large websites. J. ACM, 731–779 (2004)
Kayed, M., Chang, C.H.: Fivatech: Page-level web data extraction from template pages. IEEE Trans. on Knowl. and Data Eng., 249–263 (2010)
Ma, L., Goharian, N., Chowdhury, A., Chung, M.: Extracting unstructured data from template generated web documents. In: CIKM, pp. 512–515 (2003)
Sleiman, H., Corchuelo, R.: Trinity: On using trinary trees for unsupervised web data extraction. IEEE Trans. on Knowl. and Data Eng., 1544–1556 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Roldán, J.C. (2016). Kizomba: An Unsupervised Heuristic-Based Web Information Extractor. In: de la Prieta, F., et al. Trends in Practical Applications of Scalable Multi-Agent Systems, the PAAMS Collection. PAAMS 2016. Advances in Intelligent Systems and Computing, vol 473. Springer, Cham. https://doi.org/10.1007/978-3-319-40159-1_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-40159-1_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40158-4
Online ISBN: 978-3-319-40159-1
eBook Packages: EngineeringEngineering (R0)