Skip to main content

Intelligent Web Agents that Learn to Retrieve and Extract Information

  • Chapter
Intelligent Exploration of the Web

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 111))

Abstract

We describe systems that use machine learning methods to retrieve and/or extract textual information from the Web. In particular, we present our Wisconsin Adaptive Web Assistant (Wawa),which constructs a Web agent by accepting user preferences in form of instructions and adapting the agent’s behavior as it encounters new information. Our approach enables Wawa to rapidly build instructable and self-adaptive Web agents for both the information retrieval (IR) and information extraction (IE) tasks. Wawa uses two neural networks, which provide adaptive capabilities for its agents. User-provided instructions are compiled into these neural networks and are modified via training examples. Users can create these training examples by rating pages that Wawa retrieves, but more importantly our system uses techniques from reinforcement learning to internally create its own examples. Users can also provide additional instruction throughout the life of an agent. Empirical results on several domains show the advantages of our approach.

This work was done while the first author was at the Computer Sciences Department of the University of Wisconsin-Madison.

This research was supported in part by NLM Grant 1 R01 LM07050-01, NSF Grant IRI-9502990, and UW Vilas Trust.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho A., Sethi R., Ullman, J. (1986). Compilers, Principles, Techniques and Tools,Addison Wesley.

    Google Scholar 

  2. Bikel D., Schwartz R., Weischedel R. (1999). An Algorithm That Learns What’s in a Name, Machine Learning: Special Issue on Natural Language Learning, 34, 211–231.

    MATH  Google Scholar 

  3. Brill E. (1994). Some advances in rule-based part of speech tagging, Proc. of AAAI-94 Conference, 722–727.

    Google Scholar 

  4. Brin S., Page L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30, 107–117.

    Article  Google Scholar 

  5. Califf M.E. (1998). Relational Learning Techniques for Natural Language Information Extraction. Ph.D. Thesis, Department of Computer Sciences, University of Texas, Austin, TX.

    Google Scholar 

  6. Craven M., Kumlien J. (1999). Constructing biological knowledge-bases by extracting information from text sources, Proc. of ISMB-99, 77–86.

    Google Scholar 

  7. Cristianini N., Shawe-Taylor J. (2000). An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods,Cambridge University Press.

    Google Scholar 

  8. Dempster A., Laird N., Rubin D. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society, 39, 1–38.

    MathSciNet  MATH  Google Scholar 

  9. Drummond C., Ionescu D., Holte R. (1995). A learning agent that assists the browsing of software libraries, Technical Report TR-95–12, University of Ottawa, Ottawa, Canada.

    Google Scholar 

  10. Eliassi-Rad T., (2001). Building Intelligent Agents that Learn to Retrieve and Extract Information, Ph.D. Thesis, Computer Sciences Department. University of Wisconsin, Madison, WI.

    Google Scholar 

  11. Eliassi-Rad T., Shavlik J. (2001). A system for building intelligent agents that learn to retrieve and extract information, Appears in the International Journal on User Modeling and User-Adapted Interaction, Special Issue on User Modeling and Intelligent Agents

    Google Scholar 

  12. Eliassi-Rad T., Shavlik J. (2001). A theory-refinement approach to information extraction. Proc. of ICML-01 Conference, 130–137.

    Google Scholar 

  13. Feldman R., Liberzon Y., Rosenfeld B., Schier J., Stoppi J. (2000). A framework for specifying explicit bias for revision of approximate information extraction rules. Proc. Of KDD-00 Conference, 189–197.

    Google Scholar 

  14. Freitag D. (1998). Machine Learning for Information Extraction in Informal Domains, Ph.D. thesis, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA.

    Google Scholar 

  15. Freitag D., McCallum A. (1999). Information extraction with HMMs and shrinkage, Workshop Notes of AAAI-99 Conference on Machine Learning for Information Extraction, 31–36.

    Google Scholar 

  16. Freitag D., Kushmerick N. (2000). Boosted wrapper induction, Proc. AAAI-00 Conference, 577–583.

    Google Scholar 

  17. Goecks J., Shavlik J. (2000). Learning users’ interests by unobtrusively observing their normal behavior, Proc. of IUI-2000, 129–132.

    Google Scholar 

  18. Joachims T., Freitag D., Mitchell T. (1997). WebWatcher: A tour guide for the World Wide Web, Proc. of IJCAI-97 Conference, 770–775.

    Google Scholar 

  19. Kushmerick N. (2000). Wrapper Induction: Efficiency and expressiveness, Artificial Intelligence, 118, 15–68.

    Article  MathSciNet  MATH  Google Scholar 

  20. Leek T., (1997). Information Extraction Using Hidden Markov Models, Masters Thesis, Department of Computer Science and Engineering, University of California, San Diego.

    Google Scholar 

  21. Lieberman H. (1995). Letzia: An agent that assists Web browsing, Proc. of IJCAI-95 Conference, 924–929.

    Google Scholar 

  22. McCallum A., Rosenfeld R., Mitchell T. (1998). Improving text classification by shrinkage in a hierarchy of classes, Proc. of ICML-98 Conference, 359367.

    Google Scholar 

  23. McCallum A., Nigam K. (1998). A comparison of event models for naive Bayes text classification, Workshop Notes of AAAI-98 Conference on Learning for Text Categorization, 41–48.

    Google Scholar 

  24. McCallum A., Nigam K., Rennie J., Seymore K. (1999c). Building domain-specific search engines with machine learning techniques, AAAI-99 Spring Symposium, Stanford University, CA, 28–39.

    Google Scholar 

  25. Maclin R., Shavlik, J. (1996). Creating Advice-Taking Reinforcement Learners, Machine Learning, 22, 251–281.

    Google Scholar 

  26. Mitchell T. (1997). Machine Learning,McGraw-Hill.

    Google Scholar 

  27. National Library of Medicine (2001). The MEDLINE Database,http://www.ncbi.nlm.nih.gov/PubMed/.

    Google Scholar 

  28. Ourston D., Mooney R. (1994). Theory Refinement: Combining Analytical and Empirical Methods. Artificial Intelligence, 66, 273–309.

    Article  MathSciNet  MATH  Google Scholar 

  29. Pazzani M., Kibler D. (1992). The Utility of Knowledge in Inductive Learning. Machine Learning, 9, 57–94.

    Google Scholar 

  30. Pazzani M., Muramatsu J., Billsus D., (1996). Syskill and Webert: Identifying interesting Web sites. Proc. of AAAI-96 Conference, 54–61.

    Google Scholar 

  31. Ray S., Craven M. (2001). Representing sentence structure in hidden Markov models for information extraction, Proc. of IJCAI-01 Conference.

    Google Scholar 

  32. Rennie J., McCallum A. (1999). Using reinforcement learning to spider the Web efficiently, Proc. of ICML-99 Conference.

    Google Scholar 

  33. Riloffe E. (1998).The Sundance Sentence Analyzer ,http://www.cs.utah.edu/projects/n1p/.

    Google Scholar 

  34. Rumelhart D., Hinton G., Williams R. (1986). Learning internal representations by error propagation. In: D. Rumelhart and J. McClelland (eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1. MIT Press, 318–363.

    Google Scholar 

  35. Russell S., Norvig P. (1995). Artificial Intelligence: A Modern Approach,Prentice Hall.

    Google Scholar 

  36. Schapire R., Singer Y. (1998). Improved boosting algorithms using confidence-rated predictions, Proc. COLT-98 Conference

    Google Scholar 

  37. Selman B., Kautz H., Cohen B. (1996). Local Search Strategies for Satisfiability Testing. DIMACS Series in Discrete Mathematics and Theoretical CS, 26, 521–531.

    Google Scholar 

  38. Seymore K., McCallum A., Rosenfeld R. (1999). Learning hidden Markov model structure for information extraction Workshop Notes of AAAI-99 Conference on Machine Learning for Information Extraction, 37–42.

    Google Scholar 

  39. Shakes J., Langheinrich M., Etzioni O. (1997). Dynamic reference sifting: A case stury in the homepage domain, Proc. of WWW-97 Conference, 189–200.

    Google Scholar 

  40. Shavlik J., Eliassi-Rad T. (1998). Intelligent agents for web-based tasks: An advice-taking approach, Workshop Notes of AAAI-98 Conference on Learning for Text Categorization, Madison, WI, 63–70.

    Google Scholar 

  41. Shavlik J., Calcari S., Eliassi-Rad T., Solock J. (1999). An instructable, adaptive interface for discovering and monitoring information on the World-Wide Web, Proc. of IUI-99 Conference, 157–160.

    Google Scholar 

  42. Soderland S. (1997). Learning to extract text-based information from the World Wide Web, Proc. of KDD-97 Conference, 251–254.

    Google Scholar 

  43. Soderland S. (1999). Learning Information Extraction Rules for Semi-Structured and Free Text, Machine Learning: Special Issue on Natural Language Learning, 34, 233–272.

    MATH  Google Scholar 

  44. Sutton R.S., Barto A.G. (1998). Reinforcement Learning,MIT Press.

    Google Scholar 

  45. Towell G.G., Shavlik J.W. (1994). Knowledge-Based Artificial Neural Networks. Artificial Intelligence, 70, 119–165.

    Article  MATH  Google Scholar 

  46. van Rijsbergen C.J. (1979). Information Retrieval,Buttersworths. 2nd edition.

    Google Scholar 

  47. Yang Y. (1999). An Evaluation of Statistical Approaches to Text Categorization, Journal of Information Retrieval, 1, 67–88.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Eliassi-Rad, T., Shavlik, J. (2003). Intelligent Web Agents that Learn to Retrieve and Extract Information. In: Szczepaniak, P.S., Segovia, J., Kacprzyk, J., Zadeh, L.A. (eds) Intelligent Exploration of the Web. Studies in Fuzziness and Soft Computing, vol 111. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1772-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-7908-1772-0_16

  • Publisher Name: Physica, Heidelberg

  • Print ISBN: 978-3-7908-2519-0

  • Online ISBN: 978-3-7908-1772-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics