Intelligent Web Agents that Learn to Retrieve and Extract Information

Eliassi-Rad, Tina; Shavlik, Jude

doi:10.1007/978-3-7908-1772-0_16

Tina Eliassi-Rad⁶ &
Jude Shavlik⁷

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 111))

226 Accesses
1 Citations

Abstract

We describe systems that use machine learning methods to retrieve and/or extract textual information from the Web. In particular, we present our Wisconsin Adaptive Web Assistant (Wawa),which constructs a Web agent by accepting user preferences in form of instructions and adapting the agent’s behavior as it encounters new information. Our approach enables Wawa to rapidly build instructable and self-adaptive Web agents for both the information retrieval (IR) and information extraction (IE) tasks. Wawa uses two neural networks, which provide adaptive capabilities for its agents. User-provided instructions are compiled into these neural networks and are modified via training examples. Users can create these training examples by rating pages that Wawa retrieves, but more importantly our system uses techniques from reinforcement learning to internally create its own examples. Users can also provide additional instruction throughout the life of an agent. Empirical results on several domains show the advantages of our approach.

This work was done while the first author was at the Computer Sciences Department of the University of Wisconsin-Madison.

This research was supported in part by NLM Grant 1 R01 LM07050-01, NSF Grant IRI-9502990, and UW Vilas Trust.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aho A., Sethi R., Ullman, J. (1986). Compilers, Principles, Techniques and Tools,Addison Wesley.
Google Scholar
Bikel D., Schwartz R., Weischedel R. (1999). An Algorithm That Learns What’s in a Name, Machine Learning: Special Issue on Natural Language Learning, 34, 211–231.
MATH Google Scholar
Brill E. (1994). Some advances in rule-based part of speech tagging, Proc. of AAAI-94 Conference, 722–727.
Google Scholar
Brin S., Page L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30, 107–117.
Article Google Scholar
Califf M.E. (1998). Relational Learning Techniques for Natural Language Information Extraction. Ph.D. Thesis, Department of Computer Sciences, University of Texas, Austin, TX.
Google Scholar
Craven M., Kumlien J. (1999). Constructing biological knowledge-bases by extracting information from text sources, Proc. of ISMB-99, 77–86.
Google Scholar
Cristianini N., Shawe-Taylor J. (2000). An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods,Cambridge University Press.
Google Scholar
Dempster A., Laird N., Rubin D. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society, 39, 1–38.
MathSciNet MATH Google Scholar
Drummond C., Ionescu D., Holte R. (1995). A learning agent that assists the browsing of software libraries, Technical Report TR-95–12, University of Ottawa, Ottawa, Canada.
Google Scholar
Eliassi-Rad T., (2001). Building Intelligent Agents that Learn to Retrieve and Extract Information, Ph.D. Thesis, Computer Sciences Department. University of Wisconsin, Madison, WI.
Google Scholar
Eliassi-Rad T., Shavlik J. (2001). A system for building intelligent agents that learn to retrieve and extract information, Appears in the International Journal on User Modeling and User-Adapted Interaction, Special Issue on User Modeling and Intelligent Agents
Google Scholar
Eliassi-Rad T., Shavlik J. (2001). A theory-refinement approach to information extraction. Proc. of ICML-01 Conference, 130–137.
Google Scholar
Feldman R., Liberzon Y., Rosenfeld B., Schier J., Stoppi J. (2000). A framework for specifying explicit bias for revision of approximate information extraction rules. Proc. Of KDD-00 Conference, 189–197.
Google Scholar
Freitag D. (1998). Machine Learning for Information Extraction in Informal Domains, Ph.D. thesis, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA.
Google Scholar
Freitag D., McCallum A. (1999). Information extraction with HMMs and shrinkage, Workshop Notes of AAAI-99 Conference on Machine Learning for Information Extraction, 31–36.
Google Scholar
Freitag D., Kushmerick N. (2000). Boosted wrapper induction, Proc. AAAI-00 Conference, 577–583.
Google Scholar
Goecks J., Shavlik J. (2000). Learning users’ interests by unobtrusively observing their normal behavior, Proc. of IUI-2000, 129–132.
Google Scholar
Joachims T., Freitag D., Mitchell T. (1997). WebWatcher: A tour guide for the World Wide Web, Proc. of IJCAI-97 Conference, 770–775.
Google Scholar
Kushmerick N. (2000). Wrapper Induction: Efficiency and expressiveness, Artificial Intelligence, 118, 15–68.
Article MathSciNet MATH Google Scholar
Leek T., (1997). Information Extraction Using Hidden Markov Models, Masters Thesis, Department of Computer Science and Engineering, University of California, San Diego.
Google Scholar
Lieberman H. (1995). Letzia: An agent that assists Web browsing, Proc. of IJCAI-95 Conference, 924–929.
Google Scholar
McCallum A., Rosenfeld R., Mitchell T. (1998). Improving text classification by shrinkage in a hierarchy of classes, Proc. of ICML-98 Conference, 359367.
Google Scholar
McCallum A., Nigam K. (1998). A comparison of event models for naive Bayes text classification, Workshop Notes of AAAI-98 Conference on Learning for Text Categorization, 41–48.
Google Scholar
McCallum A., Nigam K., Rennie J., Seymore K. (1999c). Building domain-specific search engines with machine learning techniques, AAAI-99 Spring Symposium, Stanford University, CA, 28–39.
Google Scholar
Maclin R., Shavlik, J. (1996). Creating Advice-Taking Reinforcement Learners, Machine Learning, 22, 251–281.
Google Scholar
Mitchell T. (1997). Machine Learning,McGraw-Hill.
Google Scholar
National Library of Medicine (2001). The MEDLINE Database,http://www.ncbi.nlm.nih.gov/PubMed/.
Google Scholar
Ourston D., Mooney R. (1994). Theory Refinement: Combining Analytical and Empirical Methods. Artificial Intelligence, 66, 273–309.
Article MathSciNet MATH Google Scholar
Pazzani M., Kibler D. (1992). The Utility of Knowledge in Inductive Learning. Machine Learning, 9, 57–94.
Google Scholar
Pazzani M., Muramatsu J., Billsus D., (1996). Syskill and Webert: Identifying interesting Web sites. Proc. of AAAI-96 Conference, 54–61.
Google Scholar
Ray S., Craven M. (2001). Representing sentence structure in hidden Markov models for information extraction, Proc. of IJCAI-01 Conference.
Google Scholar
Rennie J., McCallum A. (1999). Using reinforcement learning to spider the Web efficiently, Proc. of ICML-99 Conference.
Google Scholar
Riloffe E. (1998).The Sundance Sentence Analyzer ,http://www.cs.utah.edu/projects/n1p/.
Google Scholar
Rumelhart D., Hinton G., Williams R. (1986). Learning internal representations by error propagation. In: D. Rumelhart and J. McClelland (eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1. MIT Press, 318–363.
Google Scholar
Russell S., Norvig P. (1995). Artificial Intelligence: A Modern Approach,Prentice Hall.
Google Scholar
Schapire R., Singer Y. (1998). Improved boosting algorithms using confidence-rated predictions, Proc. COLT-98 Conference
Google Scholar
Selman B., Kautz H., Cohen B. (1996). Local Search Strategies for Satisfiability Testing. DIMACS Series in Discrete Mathematics and Theoretical CS, 26, 521–531.
Google Scholar
Seymore K., McCallum A., Rosenfeld R. (1999). Learning hidden Markov model structure for information extraction Workshop Notes of AAAI-99 Conference on Machine Learning for Information Extraction, 37–42.
Google Scholar
Shakes J., Langheinrich M., Etzioni O. (1997). Dynamic reference sifting: A case stury in the homepage domain, Proc. of WWW-97 Conference, 189–200.
Google Scholar
Shavlik J., Eliassi-Rad T. (1998). Intelligent agents for web-based tasks: An advice-taking approach, Workshop Notes of AAAI-98 Conference on Learning for Text Categorization, Madison, WI, 63–70.
Google Scholar
Shavlik J., Calcari S., Eliassi-Rad T., Solock J. (1999). An instructable, adaptive interface for discovering and monitoring information on the World-Wide Web, Proc. of IUI-99 Conference, 157–160.
Google Scholar
Soderland S. (1997). Learning to extract text-based information from the World Wide Web, Proc. of KDD-97 Conference, 251–254.
Google Scholar
Soderland S. (1999). Learning Information Extraction Rules for Semi-Structured and Free Text, Machine Learning: Special Issue on Natural Language Learning, 34, 233–272.
MATH Google Scholar
Sutton R.S., Barto A.G. (1998). Reinforcement Learning,MIT Press.
Google Scholar
Towell G.G., Shavlik J.W. (1994). Knowledge-Based Artificial Neural Networks. Artificial Intelligence, 70, 119–165.
Article MATH Google Scholar
van Rijsbergen C.J. (1979). Information Retrieval,Buttersworths. 2nd edition.
Google Scholar
Yang Y. (1999). An Evaluation of Statistical Approaches to Text Categorization, Journal of Information Retrieval, 1, 67–88.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Box 808, L-560, Livermore, CA, 94551, USA
Tina Eliassi-Rad
Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, WI, 53717, USA
Jude Shavlik

Authors

Tina Eliassi-Rad
View author publications
You can also search for this author in PubMed Google Scholar
Jude Shavlik
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Technical University of Lodz, ul. Sterlinga 16/18, 90-217, Lodz, Poland
Piotr S. Szczepaniak
Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01-447, Warsaw, Poland
Piotr S. Szczepaniak & Janusz Kacprzyk &
Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo, 28660, Madrid, Spain
Javier Segovia
Computer Science Division, Department of Electrical Engineering and Computer Sciences, University of California, 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Eliassi-Rad, T., Shavlik, J. (2003). Intelligent Web Agents that Learn to Retrieve and Extract Information. In: Szczepaniak, P.S., Segovia, J., Kacprzyk, J., Zadeh, L.A. (eds) Intelligent Exploration of the Web. Studies in Fuzziness and Soft Computing, vol 111. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1772-0_16

Download citation

DOI: https://doi.org/10.1007/978-3-7908-1772-0_16
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-7908-2519-0
Online ISBN: 978-3-7908-1772-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics