Abstract
The Web is an ever growing repository of valuable information. That information lacks semantics since it is buried into web documents that are represented using HTML. Information extractors are software components that help software engineers in the task of extracting structured information from web documents.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This work was supported by the European Commission (FEDER) and the Spanish R&D&I programme by means of grant TIN2013-40848-R.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Crescenzi, V., Mecca, G.: Automatic information extraction from large websites. J. ACM, 731–779 (2004)
Kayed, M., Chang, C.H.: Fivatech: Page-level web data extraction from template pages. IEEE Trans. on Knowl. and Data Eng., 249–263 (2010)
Ma, L., Goharian, N., Chowdhury, A., Chung, M.: Extracting unstructured data from template generated web documents. In: CIKM, pp. 512–515 (2003)
Sleiman, H., Corchuelo, R.: Trinity: On using trinary trees for unsupervised web data extraction. IEEE Trans. on Knowl. and Data Eng., 1544–1556 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Roldán, J.C. (2016). Kizomba: An Unsupervised Heuristic-Based Web Information Extractor. In: de la Prieta, F., et al. Trends in Practical Applications of Scalable Multi-Agent Systems, the PAAMS Collection. PAAMS 2016. Advances in Intelligent Systems and Computing, vol 473. Springer, Cham. https://doi.org/10.1007/978-3-319-40159-1_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-40159-1_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40158-4
Online ISBN: 978-3-319-40159-1
eBook Packages: EngineeringEngineering (R0)