Advertisement

Machine learning for information extraction

  • Filippo Neri
  • Lorenza Saitta
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1299)

Abstract

This paper presents a brief overview of the history and trends of Machine Learning, organized according to its goal and principal methodologies. More details are given for the concept learning task, one of the most mature in the field. Applications to Information Extraction tasks are discussed.

Keywords

Reinforcement Learn Information Extraction Text Categorization Concept Learn Inductive Inference 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    In J. Kittler, K. S. Fu, and L. F. Pau, editors, Pattern Recognition Theory and Applications. Reidel Publ. Co., Boston, MA, 1982.Google Scholar
  2. 2.
    In J. M. Zytkow, editor, Machine Learning (Special Issue on Machine Discovery), volume 12. 1993.Google Scholar
  3. 3.
    In K. Morik, F. Bergadano, and W. Buntine, editors, Machine Learning (Special issue on Evaluating and Changing Representation), volume 14. 1994.Google Scholar
  4. 4.
    In M. desJardins and D. F. Gordon, editors, Machine Learning (Special issue on Bias Evaluation and Selection), volume 20. 1995.Google Scholar
  5. 5.
    In J. Shavlik, L. Hunter, and D. Searls, editors, Machine Learning (Special Issue on applications in Molecular Biology), volume 21. 1995.Google Scholar
  6. 6.
    In L. Kaelbling, editor, Machine Learning (Special Issue on Reinforcement Learning), volume 22. 1996.Google Scholar
  7. 7.
    In E. Simoudis, J. Han J., and U. Fayyad, editors, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. AAAI Press, Menlo PArk, CA, 1996.Google Scholar
  8. 8.
    In J. A. Franklin, T. M. Mitchell, and S. Thrun, editors, Machine Learning (Special Issue on Robot Learning), volume 23. 1996.Google Scholar
  9. 9.
    N. Abe and H. Li. Learning word association norms using tree cut pair models. In Proceedings of the 13th Conference on Machine Learning, pages 3–11, Bari, Italy, 1996. Morgan Kaufman.Google Scholar
  10. 10.
    D. W. Aha and D. Kibler. Noise-tolerant instance-based learning algorithms. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, pages 794–799, Detroit, MI, 1989.Google Scholar
  11. 11.
    L. Baird. Residual algorithms: Reinforcement learning with function approximation. In 12th International Conference on Machine Learning, pages 30–37, Lake Tahoe, CA, 1995.Google Scholar
  12. 12.
    F. Bergadano, A. Giordana, and L. Saitta. Learning concepts in noisy environment. IEEE Transaction on Pattern Analysis and Machine Intelligence, PAMI-10:555–578, 1988.CrossRefGoogle Scholar
  13. 13.
    M. Blum and L. Blum. Toward a mathematical theory of inductive inference. Information and Control, 28:125–155, 1975.MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    M. Botta and A. Giordana. SMART+: A multi-strategy learning tool. In IJCAI-93, Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pages 937–943, Chambéry, France, 1993.Google Scholar
  15. 15.
    L. Brennan. Stacked regression. Machine Learning, 24(1):49–64, 1996.Google Scholar
  16. 16.
    J.G. Carbonell. Learning by analogy: formulating and generalizing plans from past experience. In J.G. Carbonell, R.S. Michalski, and T. Mitchell, editors, Machine Learning, an Artificial Intelligence Approach, pages 137–161. Morgan Kaufmann, 1983.Google Scholar
  17. 17.
    L. J. Cohen. Inductive logic 1945–1977. In E. Agazzi, editor, Modern Logic. D. Reidel Publ. Co., 1980.Google Scholar
  18. 18.
    W. Cohen. Text categorization and relational learning. In 12th International Conference on Machine Learning, pages 124–132, Lake Tahoe, CA, 1995.Google Scholar
  19. 19.
    W. W. Cohen. Incremental abductive explanation based learning. Machine Learning, 15:5–24, 1993.Google Scholar
  20. 20.
    B. Croft. Machine learning and information retrieval. In 12th International Conference on Machine Learning, pages 587–587, Lake Tahoe, CA, 1995.Google Scholar
  21. 21.
    K. A. De Jong. Analysis of the Behaviour of a Class of Genetic Adaptive Systems. PhD thesis, Dept. of Computer and Communication Sciences, University of Michigan, Ann Arbor, MI, 1975.Google Scholar
  22. 22.
    K. A. De Jong, W. M. Spears, and F. D. Gordon. Using genetic algorithms for concept learning. Machine Learning, 13:161–188, 1993.Google Scholar
  23. 23.
    G. F. DeJong and R. J. Mooney. Explanation based generalization: an alternative view. Machine Learning, 1:145–176, 1986.Google Scholar
  24. 24.
    L. Devroye. Any discrimination rule can have an arbitrarily bad probability of error for finite sample size. IEEE Transaction on Pattern Analysis and Machine Intelligence, PAMI-2:154–157, 1982.zbMATHCrossRefGoogle Scholar
  25. 25.
    R. Feldman and I. Dagan. Knowledge discovery in textual databases (kdt). In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pages 112–117, Montreal, Quebec, 1995. AAAI Press.Google Scholar
  26. 26.
    T. Fine. Theories of Probability: an examination of foundations. Academic Press, New York, NY, 1974.zbMATHGoogle Scholar
  27. 27.
    D. H. Fisher. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2:139–172, 1987.Google Scholar
  28. 28.
    Y. Freund and R. E. Schapire. A decision-theorethic generalization of on-line learning and an application to boosting. In Second European Conference on Computational Learning Theory, pages 23–37. Springer-Verlag, 1995.Google Scholar
  29. 29.
    K. S. Fu. Syntactic Pattern Recognition. Academic Press, New York, NY, 1974.zbMATHGoogle Scholar
  30. 30.
    A. Giordana and F. Neri. Search-intensive concept induction. Evolutionary Computation, 3 (4):375–416, 1995.CrossRefGoogle Scholar
  31. 31.
    A. Giordana, F. Neri, L. Saitta, and M. Botta. Integrating multiple learning strategies in first order logics. Machine Learning, To appear, 1997.Google Scholar
  32. 32.
    A. Giordana and C. Sale. Genetic algorithms for learning relations. In 9th International Conference on Machine Learning, pages 169–178, Aberdeen, UK, 1992.Google Scholar
  33. 33.
    E. M. Gold. Language identification in the limit. Information and Control, 10:447–474, 1967.MathSciNetzbMATHCrossRefGoogle Scholar
  34. 34.
    D. P. Greene and S. F. Smith. Competition-based induction of decision models from examples. Machine Learning, 13:229–258, 1993.CrossRefGoogle Scholar
  35. 35.
    D. Haussler. Quantifying inductive bias — ai learning algorithms and valiant's learning framework. Artificial Intelligence, 36:177–221, 1988.MathSciNetzbMATHCrossRefGoogle Scholar
  36. 36.
    D. Haussler. Learning conjunctive concepts in structural domains. Machine Learning, 4:7–40, 1989.Google Scholar
  37. 37.
    J. H. Holland. Adaptation in Natural and Artificial Systems. The University of Michigan Press, Ann Arbor, Mi, 1975.Google Scholar
  38. 38.
    K. Hornik, M. Stinchcombe, and H. White. Multilayer feed-forward networks are universal approximators. Neural Networks, 2:359–366, 1989.zbMATHCrossRefGoogle Scholar
  39. 39.
    C.Z. Janikow. A knowledge intensive genetic algorithm for supervised learning. Machine Learning, 13:198–228, 1993.CrossRefGoogle Scholar
  40. 40.
    K. P. Jantke. Case-based learning and inductive inference. In 5th Annual ACM Workshop on Computational Learning Theory, pages 218–223, Pittsburgh, PA, 1992.Google Scholar
  41. 41.
    F. Jelinek. Continuous speech recognition by statistical methods. In Proceedings of IEEE, volume 64, pages 532–556, 1976.CrossRefGoogle Scholar
  42. 42.
    R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial intelligence, pages 1137–1143, Montreal, Quebec, 1995. AAAI Press.Google Scholar
  43. 43.
    T. Kohonen. Self-Organizing Maps. Springer-Verlag, Berlin, 1995.zbMATHCrossRefGoogle Scholar
  44. 44.
    J. E. Laird, A. Newell, and P. S. Rosenbloom. Soar: an architecture for general intelligence. Artificial Intelligence, 33, 1987.CrossRefGoogle Scholar
  45. 45.
    K. Lang. Newsweeder: Learning to filter netnews. In 12th International Conference on Machine Learning, pages 331–339, Lake Tahoe, CA, 1995.Google Scholar
  46. 46.
    P. Langley. Editorial: On machine learning. Machine Learning, 1:5–10, 1986.Google Scholar
  47. 47.
    P. Langley. Editorial: Machine learning as an experimental science. Machine Learning, 3:5–8, 1988.Google Scholar
  48. 48.
    P. Langley, G. L. Bradshaw, and H. A. Simon. Bacon.5: The discovery of conservation laws. In International Joint Conference on Artificial Intelligence, pages 121–126, Vancouver, Canada, 1981.Google Scholar
  49. 49.
    P. Langley, G. L. Bradshaw, H. A. Simon, and J. M. Zytkow. Scientific Discovery: computational explorations of the creative processes. MIT Press, Cambridge, MA, 1987.Google Scholar
  50. 50.
    D. B. Lenat. AM: an artificial intelligence approach to discovery in mathematics as heuristic search. McGraw-Hill, New York, NY, 1982.Google Scholar
  51. 51.
    D. B. Lenat. EURISKO: A program that learns new heuristics and domain concepts. the nature of heuristics iii: Program design and results. Artificial Intelligence, 21, 1983.CrossRefGoogle Scholar
  52. 52.
    D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In 11th International Machine Learning Conference, New Brunswick, NJ, July 1994.Google Scholar
  53. 53.
    E. D. Liddy, W. Paik, and E. S. Yu. Text categorization for multiple users based on semantic feature from a machine readable dictionary. ACM Transaction on Information Systems, 12:278–295, 1994.CrossRefGoogle Scholar
  54. 54.
    C. X. Ling and M. Marinov. Answering the connessionistic challenge: a symbolic model of learning the past tenses of english verbs. Cognition, 49:235–290, 1993.CrossRefGoogle Scholar
  55. 55.
    R.S. Michalski. Pattern recognition as a rule-guided inductive inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-2:349–361, 1980.zbMATHCrossRefGoogle Scholar
  56. 56.
    R.S. Michalski. A theory and methodology of inductive learning. In R. Michalski, J. Carbonell, and T. Mitchell, editors, Machine Learning, an Artificial Intelligence Approach, volume I, pages 83–134. Morgan Kaufmann, Los Altos, CA, 1983.Google Scholar
  57. 57.
    R.S. Michalski and R. Stepp. Learning from observation: conceptual clustering. In R. Michalski, J. Carbonell, and T. Mitchell, editors, Machine Learning, an Artificial Intelligence Approach, volume I, pages 83–134. Morgan Kaufmann, Los Altos, CA, 1981Google Scholar
  58. 58.
    M. Minsky and S. Papert. Perceptrons. MIT Press, Cambride, MA, 1969.zbMATHGoogle Scholar
  59. 59.
    S. Minton. Learning Search Control Knowledge: an Explanation-based Approach. Kluwer, Boston, MA, 1988.CrossRefGoogle Scholar
  60. 60.
    S. Minton, J. G. Carbonell, C. A. Knoblock, D. R. Kuokka, O. Etzioni, and Y. Gil. Explanation-based learning: a problem solving perspective. Artificial Intelligence, 40:63–118, 1989.CrossRefGoogle Scholar
  61. 61.
    T.M. Mitchell. Generalization as search. Artificial Intelligence, 18:203–226, 1982.MathSciNetCrossRefGoogle Scholar
  62. 62.
    T.M. Mitchell. Webwatcher: a learning apprentice for the world wide web. In AAAI Spring Symposium, Stanford, CA, 1995.Google Scholar
  63. 63.
    T.M. Mitchell, R.M. Keller, and S.T. Kedar-Cabelli. Explanation based generalization: an unifying view. Machine Learning, 1:47–80, 1986.Google Scholar
  64. 64.
    S. Muggleton, editor. Inductive Logic Programming. Academic Press, London, UK, 1992.zbMATHGoogle Scholar
  65. 65.
    F. Neri. First Order Logic Concept Learning by means of a Distributed Genetic Algorithm. PhD thesis, University of Torino, Italy, 1997. Available at http://www.di.uriito.it/neri/phd/thesis.ps.gz.Google Scholar
  66. 66.
    F. Neri and L. Saitta. Exploring the power of genetic search in learning symbolic classifiers. IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-18:1135–1142, 1996.CrossRefGoogle Scholar
  67. 67.
    F. Neri, L. Saitta, and A. Tiberghien. Modelling physical knowledge acquisition in children with machine learning. In Proceedings of the Nineteenth Annual Conference of the Cognitive Science Society, page In press, Stanford, 1997.Google Scholar
  68. 68.
    A. Nix and M. Vose. Modeling genetic algorithms with Markov Chains. Annals of Mathematics and Artificial Intelligence, 5:79–88, 1992.MathSciNetzbMATHCrossRefGoogle Scholar
  69. 69.
    M. Pazzani, M. Dyer, and m. Flowers. Using prior learning to facilitate the learning of new causal theories. In Proceedings of International Joint Conference on Artificial Intelligence, pages 277–279, Milan, Italy, 1987.Google Scholar
  70. 70.
    M.J. Pazzani and D. Kibler. The utility of knowledge in inductive learning. Machine Learning, 14:57–94, 1992.Google Scholar
  71. 71.
    J. R. Quinlan. Learning logical definitions from relations. Machine Learning, 5:239–266, 1990.Google Scholar
  72. 72.
    E. Riloff and W. Lehnert. Information extraction as a basis for high precision text classification. ACM Transaction on Information Systems, 12:296–333, 1994.CrossRefGoogle Scholar
  73. 73.
    J. Rissanen. Universal coding, information, prediction, and estimation. IEEE Transaction on Information Theory, IT-30:629–636, 1984.MathSciNetzbMATHCrossRefGoogle Scholar
  74. 74.
    E. W. Rosch. Principles of categorization. In E. W. Rosch and B. Lloyd, editors, Cognition and Categorization. Earlbaum, Hillsdale, NJ, 1978.Google Scholar
  75. 75.
    F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–407, 1958.CrossRefGoogle Scholar
  76. 76.
    D. E. Rumelhart and J. L. McClelland. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Parts I & II. MIT Press, Cambridge, Massachusetts, 1986.Google Scholar
  77. 77.
    M. Sahami, M. Hearst, and E. Saund. Applying the multiple cause mixture model to text categorization. In Proceedings of the 13th Conference on Machine Learning, pages 435–443, Bari, Italy, 1996. Morgan Kaufman.Google Scholar
  78. 78.
    L. Saitta and F. Bergadano. Pattern recognition and valiant's learning framework. IEEE Transaction on Pattern Analysis and Machine Intelligence, PAMI-15:145–155, 1993.CrossRefGoogle Scholar
  79. 79.
    L. Saitta, M. Botta, and F. Neri. Multistrategy learning and theory revision. Machine Learning, 11:153–172, 1993.Google Scholar
  80. 80.
    G. Salton. Development in automatic text retrieval. Science, 253:974–980, 1991.MathSciNetCrossRefGoogle Scholar
  81. 81.
    C. Schaffer. A conservation law for generalization performance. In 11th International Conference on Machine Learning, pages 259–265, New Brunswick, NJ, 1994.Google Scholar
  82. 82.
    R. E. Schapire. The strenght of weak learnability. Machine Learning, 5:197–227, 1990.Google Scholar
  83. 83.
    T. R. Shultz, D. Mareschal, and W. C. Schmidt. Modeling cognitive development on balance scale phenomena. Machine Learning, 16:57–86, 1994.Google Scholar
  84. 84.
    R. J. Solomonoff. A formal theory of inductive inference. Information and Control, 7:1–22, 224–254, 1964.MathSciNetzbMATHCrossRefGoogle Scholar
  85. 85.
    P. Suppes, M. Bottner, and L. Liang. Comprehension grammars generated from ml on nl sentences. Machine Learning, 19:133–152, 1990.zbMATHGoogle Scholar
  86. 86.
    R.S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44, 1988.Google Scholar
  87. 87.
    D. Thau. Primacy effects and selective attention in incremental clustering. In Fourteenth Annual Conference of the Cognitive Science Society, pages 219–223, Hillsdale, NJ, 1992. Lawrence Erlbaum Associates.Google Scholar
  88. 88.
    P.E. Utgoff. Machine learning of Inductive Bias. Kluwer Academic Press, 1986.Google Scholar
  89. 89.
    L. G. Valiant. Learning fallible deterministic finite automata. Communications of the ACM, 27:1134–1142, 1984.CrossRefGoogle Scholar
  90. 90.
    M. VanHeyningen. The unified computer science technical reports index: Lessons in indexing diverse resources. In Proceedings of the 2nd Int. Conf, on the World Wide Web, 1994.Google Scholar
  91. 91.
    V. N. Vapnik and Y. A. Chervonenkis. Necessary and sufficient conditions for the uniform convergence of means to their expectations. Theory Probability Applications, 26:532–553, 1981.zbMATHCrossRefGoogle Scholar
  92. 92.
    M. Veloso and J. Carbonell. Automatic case generation, storage and retrieval in prodigy. In Proceedings of the First Workshop on Multistrategy Learning, pages 363–377, Harpers Ferry, WV, 1991.Google Scholar
  93. 93.
    S. Vosniadou and W. F. Brewer. Mental models of the earth: A study of conceptual change in childhood. Cognitive Psychology, 24:535–585, 1992.CrossRefGoogle Scholar
  94. 94.
    T. W. Yan and H. Garcia-Molina. Index structures for selective dissemination of information. Technical Report TRSTAN-CS-92-1454, Stanford University, Stanford, CA, 1992.Google Scholar
  95. 95.
    O. R. Zaane and J. Han. Resource and knowledge discovery in global information systems: A preliminary design and experiment. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pages 331–336, Menlo Park, CA, 1995. AAAI Press.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Filippo Neri
    • 1
  • Lorenza Saitta
    • 1
  1. 1.Dipartimento di InformaticaUniversità di TorinoTorinoItaly

Personalised recommendations