Learning (k,l)-Contextual Tree Languages for Information Extraction

Raeymaekers, Stefan; Bruynooghe, Maurice; Van den Bussche, Jan

doi:10.1007/11564096_31

Stefan Raeymaekers²³,
Maurice Bruynooghe²³ &
Jan Van den Bussche²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3720))

Included in the following conference series:

European Conference on Machine Learning

5539 Accesses
8 Citations

Abstract

This paper introduces a novel method for learning a wrapper for extraction of text nodes from web pages based upon (k,l)-contextual tree languages. It also introduces a method to learn good values of k and l based on a few positive and negative examples. Finally, it describes how the algorithm can be integrated in a tool for information extraction.

Download to read the full chapter text

Chapter PDF

The Distiller Framework: Current State and Future Challenges

DataWords: Getting Contrarian with Text, Structured Data and Explanations

Enhancing Concept Extraction from Polish Texts with Rule Management

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Ahonen, H.: Generating grammars for structured documents using grammatical inference methods. PhD thesis, University of Helsinki, Department of Computer Science (1996)
Google Scholar
Angluin, D.: Inference of reversible languages. Journal of the ACM (JACM) 29(3), 741–765 (1982)
Article MATH MathSciNet Google Scholar
Angluin, D.: Queries and concept-learning. Machine Learning 2, 319–342 (1988)
Google Scholar
Carme, J., Lemay, A., Niehren, J.: Learning node selecting tree transducer from completely annotated examples. In: Paliouras, G., Sakakibara, Y. (eds.) ICGI 2004. LNCS (LNAI), vol. 3264, pp. 91–102. Springer, Heidelberg (2004)
Chapter Google Scholar
Chidlovskii, B., Ragetli, J., de Rijke, M.: Wrapper generation via grammar induction. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 96–108. Springer, Heidelberg (2000)
Chapter Google Scholar
Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Innovative Applications of AI Conference, pp. 577–583. AAAI Press, Menlo Park (2000)
Google Scholar
Freitag, D., McCallum, A.: Information extraction with HMMs and shrinkage. In: AAAI 1999 Workshop on Machine Learning for Information Extraction (1999)
Google Scholar
García, P.: Learning k-testable tree sets from positive data. Technical report, Technical Report DSIC-ii-1993-46, DSIC, Universidad Politecnica de Valencia (1993)
Google Scholar
García, P., Vidal, E.: Inference of k-testable languages in the strict sense and application to syntactic pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 12(9), 920–925 (1990)
Article Google Scholar
Gold, E.M.: Language identification in the limit. Information and Control 10(5), 447–474 (1967)
Article MATH Google Scholar
Knuutila, T.: Inference of k-testable tree languages. In: Bunke, H. (ed.) Advances in Structural and Syntactic Pattern Recognition: Proc. of the Intl. Workshop, pp. 109–120. World Scientific, Singapore (1993)
Google Scholar
Kosala, R., Bruynooghe, M., Blockeel, H., den Bussche, J.V.: Information extraction from web documents based on local unranked tree automaton inference. In: Intl. Joint Conference on Artificial Intelligence (IJCAI), pp. 403–408 (2003)
Google Scholar
Kosala, R., Van den Bussche, J., Bruynooghe, M., Blockeel, H.: Information extraction in structured documents using tree automata induction. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 299–310. Springer, Heidelberg (2002)
Chapter Google Scholar
Kushmerick, N., Weld, D.S., Doorenbos, R.B.: Wrapper induction for information extraction. In: Intl. Joint Conference on Artificial Intelligence (IJCAI), pp. 729–737 (1997)
Google Scholar
McNaughton, R.: Algebraic decision procedures for local testability. Math. Systems Theory 8(1), 60–76 (1974)
Article MATH MathSciNet Google Scholar
Muggleton, S.: Inductive Acquisition of Expert Knowledge. Addison-Wesley, Reading (1990)
Google Scholar
Muslea, I., Minton, S., Knoblock, C.: Hierarchical wrapper induction for semistructured information sources. Journal of Autonomous Agents and Multi-Agent Systems 4, 93–114 (2001)
Article Google Scholar
Muslea, I., Minton, S., Knoblock, C.: Active learning with strong and weak views: A case study on wrapper induction. In: Intl. Joint Conference on Artificial Intelligence, IJCAI (2003)
Google Scholar
Raeymaekers, S., Bruynooghe, M.: Extracting information from structured documents with automata in a single run. In: Proc. 2nd Int. Workshop on Mining Graphs, Trees and Sequences (MGTS 2004), Pisa, Italy, pp. 71–82. University of Pisa (2004)
Google Scholar
Rico-Juan, J.R., Calera-Rubio, J., Carrasco, R.C.: Probabilistic k-testable tree languages. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS (LNAI), vol. 1891, pp. 221–228. Springer, Heidelberg (2000)
Chapter Google Scholar
Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning 34(1-3), 233–272 (1999)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, K.U.Leuven, Celestijnenlaan 200A, B-3001, Leuven
Stefan Raeymaekers & Maurice Bruynooghe
Dept. Theoretical Computer Science, Universiteit Hasselt, Agoralaan D, B-3590, Diepenbeek
Jan Van den Bussche

Authors

Stefan Raeymaekers
View author publications
You can also search for this author in PubMed Google Scholar
Maurice Bruynooghe
View author publications
You can also search for this author in PubMed Google Scholar
Jan Van den Bussche
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics of the University of Porto, Portugal
João Gama
Faculdade de Engenharia & LIAAD, Universidade do Porto, Portugal
Rui Camacho
LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Rua de Ceuta, 118-6, 4050-190, Porto, Portugal
Pavel B. Brazdil
LIACC/FEP, Universidade do Porto, Portugal
Alípio Mário Jorge
LIAAD-INESC Porto LA / FEP, University of Porto, R. de Ceuta, 118, 6., 4050-190, Porto, Portugal
Luís Torgo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raeymaekers, S., Bruynooghe, M., Van den Bussche, J. (2005). Learning (k,l)-Contextual Tree Languages for Information Extraction. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds) Machine Learning: ECML 2005. ECML 2005. Lecture Notes in Computer Science(), vol 3720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564096_31

Download citation

DOI: https://doi.org/10.1007/11564096_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29243-2
Online ISBN: 978-3-540-31692-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning (k,l)-Contextual Tree Languages for Information Extraction

Abstract

Chapter PDF

Similar content being viewed by others

The Distiller Framework: Current State and Future Challenges

DataWords: Getting Contrarian with Text, Structured Data and Explanations

Enhancing Concept Extraction from Polish Texts with Rule Management

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Learning (k,l)-Contextual Tree Languages for Information Extraction

Abstract

Chapter PDF

Similar content being viewed by others

The Distiller Framework: Current State and Future Challenges

DataWords: Getting Contrarian with Text, Structured Data and Explanations

Enhancing Concept Extraction from Polish Texts with Rule Management

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation