Skip to main content
Log in

The State of the Art in Text Filtering

  • Published:
User Modeling and User-Adapted Interaction Aims and scope Submit manuscript

Abstract

This paper develops a conceptual framework for text filtering practice and research, and reviews present practice in the field. Text filtering is an information seeking process in which documents are selected from a dynamic text stream to satisfy a relatively stable and specific information need. A model of the information seeking process is introduced and specialized to define text filtering. The historical development of text filtering is then reviewed and case studies of recent work are used to highlight important design characteristics of modern text filtering systems. User modeling techniques drawn from information retrieval, recommender systems, machine learning and other fields are described. The paper concludes with observations on the present state of the art and implications for future research on text filtering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References*

  • Allan, J.: 1996, ‘Incremental Relevance Feedback for Information Filtering’. In: H.-P. Frei, D. Harman, P. Schäuble, and R. Wilkinson (eds.): Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. http://ciir.cs.umass.edu/info/psfiles/irpubs/james-sigir96.ps.gz.

  • Avery, C. and Zeckhauser, R.: 1997, ‘Recommender Systems for Evaluating Computer Messages’. Communications of the ACM 40(3), 88–89.

    Article  Google Scholar 

  • Baclace, P.E.: 1992, ‘Competitive Agents for Information Filtering’. Communications of the ACM 35(12), 50.

    Article  Google Scholar 

  • Balabanović, M. and Shoham, Y.: 1997, ‘Content-Based, Collaborative Recommendation’. Communications of the ACM 40(3), 66–72. http://robotics.stanford.edu/people/marko/papers/cacm.ps.

    Article  Google Scholar 

  • Belkin, N.J. and Croft, W.B.: 1992, ‘Information Filtering and Information Retrieval: Two Sides of the Same Coin?’. Communications of the ACM 35(12), 29–38.

    Article  Google Scholar 

  • Bielefield, A. and Cheeseman, L.: 1994, Maintaining the Privacy of Library Records. New York: Neal-Schuman.

    Google Scholar 

  • Blair, D.C.: 1990, Language and Representation in Information Retrieval. Amsterdam: Elsevier.

    Google Scholar 

  • Bowen, T.F., Gopal, G., Herman, G., Hickey, T., Lee, K., Mansfield, W.H., Raitz, J. and Weiribnrib, A.: 1992, ‘The Datacycle Architecture’. Communications of the ACM 35(12), 71–80.

    Article  Google Scholar 

  • Brewer, R.S. and Johnson, P.M.: 1994, ‘Toward Collaborative Knowledge Management within Large, Dynamically Structured Information Systems’. Technical Report ICS-TR-92-22, University of Hawaii, Department of Information and Computer Sciences, Honolulu. ftp://ftp.ics.hawaii.edu/ pub/tr/ics-tr-94-02.ps.Z.

    Google Scholar 

  • Chaum, D.L.: 1981, ‘Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms’. Communications of the ACM 24(2), 84–88.

    Article  Google Scholar 

  • Cooper, D.A. and Birman, K.P.: 1995, ‘Preserving Privacy in a Network of Mobile Computers’. In: Proceedings of the 1995 IEEE Symposium on Security and Privacy. pp. 26–38. http://cstr. cs.cornell.edu.

  • Denning, P.J.: 1982, ‘Electronic Junk’. Communications of the ACM 25(3), 163–165.

    Article  Google Scholar 

  • Denton, B.: 1995, ‘TenWays to Control DIALOG Alert Costs’. Online 19(2), 47–48.

    Google Scholar 

  • Foltz, P.W.: 1990, ‘Using Latent Semantic Indexing for Information Filtering’. In: F.H. Lochovsky and R.B. Allen (eds.): Conference on Office Information Systems. pp. 40–47. http://wwwpsych. nmsu.edu/~pfoltz/cois/filtering-cois.html.

  • Foltz, P.W. and Dumais, S.T.: 1992, ‘Personalized Information Delivery: An Analysis of Information Filtering Methods’. Communications of the ACM 35(12), 51–60. http://wwwpsych. nmsu.edu/~pfoltz/cacm/cacm.html.

    Article  Google Scholar 

  • Frakes, W.B. and Baeza-Yates, R. (eds.): 1992, Information Retrieval: Data Structures and Algorithms. Englewood Cliffs, NJ: Prentice Hall.

    Google Scholar 

  • Goldberg, D., Nicholas, D., Oki, B.M., and Terry, D.: 1992, ‘Using Collaborative Filtering to Weave an Information Tapestry’. Communications of the ACM 35(12), 61–70.

    Article  Google Scholar 

  • Harman, D.: 1992, ‘The DARPA TIPSTER Project’. ACM SIGIR Forum 26(2), 26–28.

    Google Scholar 

  • Harman, D.: 1993, ‘Overview of the First TREC Conference’. In: R. Korfhage, E. Rasmussen, and P. Willett (eds.): Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 36–47.

  • Harman, D.K. (ed.): 1997, ‘The Fifth Text REtrieval Conference (TREC-5)’. Gaithersburg, MD: National Institutes of Standards and Technology, Department of Commerce. To appear. http://www-nlpir.nist.gov/TREC.

    Google Scholar 

  • Hearst, M.A.: 1994, ‘Content and Structure in Automated Full-Text Information Access’. Ph.D. thesis, University of California, Berkeley. http://www.parc.xerox.com/istl/members/hearst/ publications. shtml.

    Google Scholar 

  • Hill, W., Rosenstein, M. and Stead, L.: 1994, ‘Community and History-of-Use Navigation’. In: Electronic Proceedings of the Second World Wide Web Conference '94. Not available in print. http://community.bellcore.com/navigation/home-page.html.

  • Hill, W.C., Hollan, J.D., Wroblewski, D. and McCandless, T.: 1992, ‘Read Wear and Edit Wear’. In: Proceedings of ACM Conference on Human Factors in Computing Systems, CHI '92. pp. 3–9.

  • Hirschman, L.: 1991, ‘Comparing MUCK-II andMUC-3: Assessing the Difficulty of Different Tasks’. In: Proceedings, Third Message Understanding Conference (MUC-3). pp. 25–30.

  • Housman, E.M.: 1969, ‘Survey of Current Systems for Selective Dissemination of Information’. Technical Report SIG/SDI-1, American Society for Information Science Special Interest Group on SDI, Washington, DC.

    Google Scholar 

  • Jacobs, P.S. and Rau, L.F.: 1990, ‘SCISOR: Extracting Information from On-line News’. Communications of the ACM 33(11), 88–97.

    Article  Google Scholar 

  • Jennings, A. and Higuchi, H.: 1993, ‘A User Model Neural Network for a Personal News Service’. User Modeling and User-Adapted Interaction 3(1), 1–25.

    Article  Google Scholar 

  • Jiang, Z.: 1993, ‘Understanding Information Filtering and Providing and Information Filtering System Model’. Master's thesis, University of Missouri, Kansas City.

    Google Scholar 

  • Karlgren, J., Hook, K., Lantz, A., Palme, J., and Pargman, D.: 1994, ‘The Glass Box User Model for Filtering’. Technical Report T94:09, Swedish Institute of Computer Science. http://www.dsv.su.se/~fk/if Doc/JPfilter-filer/Glassbox1.1.ps.Z.

  • Konstan, J.A., Miller, B.N., Maltz, D., Herlocker, J.L., Gordon, L.R. and Riedl, J.: 1997, ‘GroupLens: Applying Collaborative Filtering to Usenet News’. Communications of the ACM 40(3), 77–87.

    Article  Google Scholar 

  • Langley, P.: 1996, Elements of Machine Learning. San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Lehnert, W. and Sundheim, B.: 1991, ‘A Performance Evaluation of Text Analysis Technologies’. AI Magazine 12(3), 81–94.

    Google Scholar 

  • Loeb, S.: 1992, ‘Architecting Personalized Delivery of Multimedia Information’. Communications of the ACM 35(12), 39–48.

    Article  Google Scholar 

  • Luhn, H.P.: 1958, ‘A Business Intelligence System’. IBM Journal of Research and Development 2(4), 314–319.

    Article  MathSciNet  Google Scholar 

  • Malone, T.W., Grant K.R., Turbak, F.A., Brobst, S.A. and Cohen, M.D.: 1987, ‘Intelligent Information Sharing Systems’. Communications of the ACM 30(5), 390–402.

    Article  Google Scholar 

  • Marchionini, G.: 1995, Information Seeking in Electronic Environments. Cambridge: Cambridge University Press.

    Google Scholar 

  • Marchionini, G.: 1996, ‘Browsing: Not Lazy Searching’. In: S. Hardin (ed.): Proceedings of the 59th Annual Meeting of the American Society for Information Science. p. 267.

  • Mettler, M.: 1993, ‘TRW Japanese Fast Data Finder’. In: TIPSTER Text Program Phase I: Proceedings of a Workshop held at Fredricksburg, Virginia. pp. 113–116.

  • Mock, K.J.: 1996, ‘Intelligent Information Filtering via Hybrid Techniques: Hill Climbing, Case-Based Reasoning, Index Patterns, and Genetic Algorithms’. Ph.D. thesis, University of California Davis. http://phobos.cs.ucdavis.edu:8001/~mock/infos/infos.html.

  • Morita, M. and Shinoda, Y.: 1994, ‘Information Filtering Based on User Behavior Analysis and Best Match Text Retrieval’. In: W.B. Croft and C. van Rijsbergen (eds.): Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. pp. 272–281. http://shinoda-www.jaist.ac.jp:8000/papers/1994/sigir-94.ps.

  • Oard, D.W.: 1997, ‘Adaptive Filtering of Multilingual Document Streams’. In: Fifth RIAO Conference on Computer Assisted Information Searching on the Internet. http://www.glue.umd.edu/ ~oard/research.html.

  • Olsen, K.A., Korfhage, R.R., Sochats, K.M., Spring, M.B. and Williams, J.G.: 1993, ‘Visualization of a Document Collection: The VIBE System’. Information Processing and Management 29(1), 69–81.

    Article  Google Scholar 

  • Pazzani, M., Muramatsu, J. and Billsus, D.: 1996, ‘Syskill and Webert: Identifying Interesting Web Sites’. In: M. Hearst and H. Hirsh (eds.): AAAI Spring Symposium on Machine Learning in Information Access. http://www.parc.xerox.com/istl/projects/mlia/papers/pazzani.ps.

  • Pollock, S.: 1988, ‘A Rule-Based Message Filtering System’. ACM Transactions on Office Information Systems 6(3), 232–254.

    Article  Google Scholar 

  • Ram, A.: 1992, ‘Natural Language Understanding for Information Filtering Systems’. Communications of the ACM 35(12), 80–81.

    Article  MathSciNet  Google Scholar 

  • Resnick, P. and Miller, J.: 1996, ‘PICS: Internet Access Controls Without Censorship’. Communications of the ACM 39(10), 87–93. http://www.w3.org/pub/WWW/PICS/.

    Article  Google Scholar 

  • Resnick, P. and Varian, H.R.: 1997, ‘Recommender Systems’. Communications of the ACM 40(3), 56–58.

    Article  Google Scholar 

  • Rich, E.A.: 1979, ‘User Modeling via Stereotypes’. Cognitive Science 3, 329–354.

    Article  Google Scholar 

  • Salton, G. and M.J. McGill: 1983, Introduction to Modern Information Retrieval. New York: McGraw-Hill.

    MATH  Google Scholar 

  • Schütze, H., Hull, D.A., and Pedersen, J.O.: 1995, ‘A Comparison of Classifiers and Document Representations for the Routing Problem’. In: E.A. Fox, P. Ingwersen, and R. Fidel (eds.): Proceedings of the 18th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval. pp. 229–237.

  • Sheth, B.: 1994, ‘A Learning Approach to Personalized Information Filtering’. Master's thesis, MIT, Media Lab. http://agents.www.media.mit.edu/groups/agents/papers/newt-thesis/main.html.

  • Singhal, A., Buckley, C. and Mitra, M.: 1996, ‘Pivoted Document Length Normalization’. In: H.-P. Frei, D. Harman, P. Schaüble, and R. Wilkinson (eds.): Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 21–29. http://cs-tr.cs.cornell.edu/.

  • Soergel, D.: 1994, ‘Indexing and Retrieval Performance: The Logical Evidence’. Journal of the American Society for Information Science 45(8), 589–599.

    Article  Google Scholar 

  • Stadnyk, I. and Kass, R.: 1992, ‘Modeling Users' Interests in Information Filters’. Communications of the ACM 35(12), 49–50.

    Article  Google Scholar 

  • Stevens, C.: 1992a, ‘Automating the Creation of Information Filters’. Communications of the ACM 35(12), 48. http://www.holodeck.com/curt/mypapers.html.

    Article  Google Scholar 

  • Stevens, C.: 1992b, ‘Knowledge-Based Assistance for Accessing Large, Poorly Structured Information Spaces’. Ph.D. thesis, University of Colorado, Department of Computer Science, Boulder. http://www.holodeck.com/curt/mypapers.html.

    Google Scholar 

  • Taylor, R.S.: 1962, ‘The Process of Asking Questions’. American Documentation 13(4), 391–396.

    Google Scholar 

  • Terry, D.B.: 1993, ‘A Tour Through Tapestry’. In: Proceedings of the ACM Conference on Organizational Computing Systems (COOCS). pp. 21–30.

  • Turtle, H. and Croft, W.B.: 1990, ‘Inference Networks for Document Retrieval’. In: J.-L. Vidick (ed.): Proceedings of the 13th International Conference on Research and Development in Information Retrieval. pp. 1–24.

  • Turtle, H.R. and Croft, W.B.: 1992, ‘A Comparison of Text Retrieval Models’. The Computer Journal 35(3), 279–290.

    Article  MATH  Google Scholar 

  • Winiwarter, W., Höfferer, M. and Knaus, B.: 1997, ‘CIFS – A Cognitive Information Filtering System with Evolutionary Adaptation’. Submitted.

  • Wresch, W.: 1996, Disconnected: Haves and Have-nots in the Information Age. New Brunswick, NJ: Rutgers University Press.

    Google Scholar 

  • Wyle, M. and Frei, H.: 1989, ‘Retrieving Highly Dynamic, Widely Distributed Information’. In: N. J. Belkin and C. van Rijsbergen (eds.): Proceedings of the Twelfth Annual International ACMSIGIR Conference on Research and Development in Information Retrieval. pp. 108–115.

  • Wyle, M.F.: 1995, ‘Effective Dissemination of WAN Information’. Ph.D. thesis, LaSalle University, Mandeville, LA. http://vhdl.org/~wyle/diss/diss.html.

    Google Scholar 

  • Yan, T.W. and Garcia-Molina, H.: 1994, ‘Distributed Selective Dissemination of Information’. In: Proceedings of the Third International Conference on Parallel and Distributed Information Systems. pp. 89–98. ftp://db.stanford.edu/pub/yan/1994/dsdi.ps.

  • Yan, T.W. and Garcia-Molina, H.: 1995, ‘SIFT – A Tool for Wide-Area Information Dissemination’. In: Proceedings of the 1995 USENIX Technical Conference. pp. 177–186. ftp://db.stanford.edu/ pub/yan/1994/sift.ps.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oard, D.W. The State of the Art in Text Filtering. User Modeling and User-Adapted Interaction 7, 141–178 (1997). https://doi.org/10.1023/A:1008287121180

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008287121180

Navigation