Abstract
This paper develops a conceptual framework for text filtering practice and research, and reviews present practice in the field. Text filtering is an information seeking process in which documents are selected from a dynamic text stream to satisfy a relatively stable and specific information need. A model of the information seeking process is introduced and specialized to define text filtering. The historical development of text filtering is then reviewed and case studies of recent work are used to highlight important design characteristics of modern text filtering systems. User modeling techniques drawn from information retrieval, recommender systems, machine learning and other fields are described. The paper concludes with observations on the present state of the art and implications for future research on text filtering.
Similar content being viewed by others
References*
Allan, J.: 1996, ‘Incremental Relevance Feedback for Information Filtering’. In: H.-P. Frei, D. Harman, P. Schäuble, and R. Wilkinson (eds.): Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. http://ciir.cs.umass.edu/info/psfiles/irpubs/james-sigir96.ps.gz.
Avery, C. and Zeckhauser, R.: 1997, ‘Recommender Systems for Evaluating Computer Messages’. Communications of the ACM 40(3), 88–89.
Baclace, P.E.: 1992, ‘Competitive Agents for Information Filtering’. Communications of the ACM 35(12), 50.
Balabanović, M. and Shoham, Y.: 1997, ‘Content-Based, Collaborative Recommendation’. Communications of the ACM 40(3), 66–72. http://robotics.stanford.edu/people/marko/papers/cacm.ps.
Belkin, N.J. and Croft, W.B.: 1992, ‘Information Filtering and Information Retrieval: Two Sides of the Same Coin?’. Communications of the ACM 35(12), 29–38.
Bielefield, A. and Cheeseman, L.: 1994, Maintaining the Privacy of Library Records. New York: Neal-Schuman.
Blair, D.C.: 1990, Language and Representation in Information Retrieval. Amsterdam: Elsevier.
Bowen, T.F., Gopal, G., Herman, G., Hickey, T., Lee, K., Mansfield, W.H., Raitz, J. and Weiribnrib, A.: 1992, ‘The Datacycle Architecture’. Communications of the ACM 35(12), 71–80.
Brewer, R.S. and Johnson, P.M.: 1994, ‘Toward Collaborative Knowledge Management within Large, Dynamically Structured Information Systems’. Technical Report ICS-TR-92-22, University of Hawaii, Department of Information and Computer Sciences, Honolulu. ftp://ftp.ics.hawaii.edu/ pub/tr/ics-tr-94-02.ps.Z.
Chaum, D.L.: 1981, ‘Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms’. Communications of the ACM 24(2), 84–88.
Cooper, D.A. and Birman, K.P.: 1995, ‘Preserving Privacy in a Network of Mobile Computers’. In: Proceedings of the 1995 IEEE Symposium on Security and Privacy. pp. 26–38. http://cstr. cs.cornell.edu.
Denning, P.J.: 1982, ‘Electronic Junk’. Communications of the ACM 25(3), 163–165.
Denton, B.: 1995, ‘TenWays to Control DIALOG Alert Costs’. Online 19(2), 47–48.
Foltz, P.W.: 1990, ‘Using Latent Semantic Indexing for Information Filtering’. In: F.H. Lochovsky and R.B. Allen (eds.): Conference on Office Information Systems. pp. 40–47. http://wwwpsych. nmsu.edu/~pfoltz/cois/filtering-cois.html.
Foltz, P.W. and Dumais, S.T.: 1992, ‘Personalized Information Delivery: An Analysis of Information Filtering Methods’. Communications of the ACM 35(12), 51–60. http://wwwpsych. nmsu.edu/~pfoltz/cacm/cacm.html.
Frakes, W.B. and Baeza-Yates, R. (eds.): 1992, Information Retrieval: Data Structures and Algorithms. Englewood Cliffs, NJ: Prentice Hall.
Goldberg, D., Nicholas, D., Oki, B.M., and Terry, D.: 1992, ‘Using Collaborative Filtering to Weave an Information Tapestry’. Communications of the ACM 35(12), 61–70.
Harman, D.: 1992, ‘The DARPA TIPSTER Project’. ACM SIGIR Forum 26(2), 26–28.
Harman, D.: 1993, ‘Overview of the First TREC Conference’. In: R. Korfhage, E. Rasmussen, and P. Willett (eds.): Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 36–47.
Harman, D.K. (ed.): 1997, ‘The Fifth Text REtrieval Conference (TREC-5)’. Gaithersburg, MD: National Institutes of Standards and Technology, Department of Commerce. To appear. http://www-nlpir.nist.gov/TREC.
Hearst, M.A.: 1994, ‘Content and Structure in Automated Full-Text Information Access’. Ph.D. thesis, University of California, Berkeley. http://www.parc.xerox.com/istl/members/hearst/ publications. shtml.
Hill, W., Rosenstein, M. and Stead, L.: 1994, ‘Community and History-of-Use Navigation’. In: Electronic Proceedings of the Second World Wide Web Conference '94. Not available in print. http://community.bellcore.com/navigation/home-page.html.
Hill, W.C., Hollan, J.D., Wroblewski, D. and McCandless, T.: 1992, ‘Read Wear and Edit Wear’. In: Proceedings of ACM Conference on Human Factors in Computing Systems, CHI '92. pp. 3–9.
Hirschman, L.: 1991, ‘Comparing MUCK-II andMUC-3: Assessing the Difficulty of Different Tasks’. In: Proceedings, Third Message Understanding Conference (MUC-3). pp. 25–30.
Housman, E.M.: 1969, ‘Survey of Current Systems for Selective Dissemination of Information’. Technical Report SIG/SDI-1, American Society for Information Science Special Interest Group on SDI, Washington, DC.
Jacobs, P.S. and Rau, L.F.: 1990, ‘SCISOR: Extracting Information from On-line News’. Communications of the ACM 33(11), 88–97.
Jennings, A. and Higuchi, H.: 1993, ‘A User Model Neural Network for a Personal News Service’. User Modeling and User-Adapted Interaction 3(1), 1–25.
Jiang, Z.: 1993, ‘Understanding Information Filtering and Providing and Information Filtering System Model’. Master's thesis, University of Missouri, Kansas City.
Karlgren, J., Hook, K., Lantz, A., Palme, J., and Pargman, D.: 1994, ‘The Glass Box User Model for Filtering’. Technical Report T94:09, Swedish Institute of Computer Science. http://www.dsv.su.se/~fk/if Doc/JPfilter-filer/Glassbox1.1.ps.Z.
Konstan, J.A., Miller, B.N., Maltz, D., Herlocker, J.L., Gordon, L.R. and Riedl, J.: 1997, ‘GroupLens: Applying Collaborative Filtering to Usenet News’. Communications of the ACM 40(3), 77–87.
Langley, P.: 1996, Elements of Machine Learning. San Francisco: Morgan Kaufmann.
Lehnert, W. and Sundheim, B.: 1991, ‘A Performance Evaluation of Text Analysis Technologies’. AI Magazine 12(3), 81–94.
Loeb, S.: 1992, ‘Architecting Personalized Delivery of Multimedia Information’. Communications of the ACM 35(12), 39–48.
Luhn, H.P.: 1958, ‘A Business Intelligence System’. IBM Journal of Research and Development 2(4), 314–319.
Malone, T.W., Grant K.R., Turbak, F.A., Brobst, S.A. and Cohen, M.D.: 1987, ‘Intelligent Information Sharing Systems’. Communications of the ACM 30(5), 390–402.
Marchionini, G.: 1995, Information Seeking in Electronic Environments. Cambridge: Cambridge University Press.
Marchionini, G.: 1996, ‘Browsing: Not Lazy Searching’. In: S. Hardin (ed.): Proceedings of the 59th Annual Meeting of the American Society for Information Science. p. 267.
Mettler, M.: 1993, ‘TRW Japanese Fast Data Finder’. In: TIPSTER Text Program Phase I: Proceedings of a Workshop held at Fredricksburg, Virginia. pp. 113–116.
Mock, K.J.: 1996, ‘Intelligent Information Filtering via Hybrid Techniques: Hill Climbing, Case-Based Reasoning, Index Patterns, and Genetic Algorithms’. Ph.D. thesis, University of California Davis. http://phobos.cs.ucdavis.edu:8001/~mock/infos/infos.html.
Morita, M. and Shinoda, Y.: 1994, ‘Information Filtering Based on User Behavior Analysis and Best Match Text Retrieval’. In: W.B. Croft and C. van Rijsbergen (eds.): Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. pp. 272–281. http://shinoda-www.jaist.ac.jp:8000/papers/1994/sigir-94.ps.
Oard, D.W.: 1997, ‘Adaptive Filtering of Multilingual Document Streams’. In: Fifth RIAO Conference on Computer Assisted Information Searching on the Internet. http://www.glue.umd.edu/ ~oard/research.html.
Olsen, K.A., Korfhage, R.R., Sochats, K.M., Spring, M.B. and Williams, J.G.: 1993, ‘Visualization of a Document Collection: The VIBE System’. Information Processing and Management 29(1), 69–81.
Pazzani, M., Muramatsu, J. and Billsus, D.: 1996, ‘Syskill and Webert: Identifying Interesting Web Sites’. In: M. Hearst and H. Hirsh (eds.): AAAI Spring Symposium on Machine Learning in Information Access. http://www.parc.xerox.com/istl/projects/mlia/papers/pazzani.ps.
Pollock, S.: 1988, ‘A Rule-Based Message Filtering System’. ACM Transactions on Office Information Systems 6(3), 232–254.
Ram, A.: 1992, ‘Natural Language Understanding for Information Filtering Systems’. Communications of the ACM 35(12), 80–81.
Resnick, P. and Miller, J.: 1996, ‘PICS: Internet Access Controls Without Censorship’. Communications of the ACM 39(10), 87–93. http://www.w3.org/pub/WWW/PICS/.
Resnick, P. and Varian, H.R.: 1997, ‘Recommender Systems’. Communications of the ACM 40(3), 56–58.
Rich, E.A.: 1979, ‘User Modeling via Stereotypes’. Cognitive Science 3, 329–354.
Salton, G. and M.J. McGill: 1983, Introduction to Modern Information Retrieval. New York: McGraw-Hill.
Schütze, H., Hull, D.A., and Pedersen, J.O.: 1995, ‘A Comparison of Classifiers and Document Representations for the Routing Problem’. In: E.A. Fox, P. Ingwersen, and R. Fidel (eds.): Proceedings of the 18th Annual International ACMSIGIR Conference on Research and Development in Information Retrieval. pp. 229–237.
Sheth, B.: 1994, ‘A Learning Approach to Personalized Information Filtering’. Master's thesis, MIT, Media Lab. http://agents.www.media.mit.edu/groups/agents/papers/newt-thesis/main.html.
Singhal, A., Buckley, C. and Mitra, M.: 1996, ‘Pivoted Document Length Normalization’. In: H.-P. Frei, D. Harman, P. Schaüble, and R. Wilkinson (eds.): Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 21–29. http://cs-tr.cs.cornell.edu/.
Soergel, D.: 1994, ‘Indexing and Retrieval Performance: The Logical Evidence’. Journal of the American Society for Information Science 45(8), 589–599.
Stadnyk, I. and Kass, R.: 1992, ‘Modeling Users' Interests in Information Filters’. Communications of the ACM 35(12), 49–50.
Stevens, C.: 1992a, ‘Automating the Creation of Information Filters’. Communications of the ACM 35(12), 48. http://www.holodeck.com/curt/mypapers.html.
Stevens, C.: 1992b, ‘Knowledge-Based Assistance for Accessing Large, Poorly Structured Information Spaces’. Ph.D. thesis, University of Colorado, Department of Computer Science, Boulder. http://www.holodeck.com/curt/mypapers.html.
Taylor, R.S.: 1962, ‘The Process of Asking Questions’. American Documentation 13(4), 391–396.
Terry, D.B.: 1993, ‘A Tour Through Tapestry’. In: Proceedings of the ACM Conference on Organizational Computing Systems (COOCS). pp. 21–30.
Turtle, H. and Croft, W.B.: 1990, ‘Inference Networks for Document Retrieval’. In: J.-L. Vidick (ed.): Proceedings of the 13th International Conference on Research and Development in Information Retrieval. pp. 1–24.
Turtle, H.R. and Croft, W.B.: 1992, ‘A Comparison of Text Retrieval Models’. The Computer Journal 35(3), 279–290.
Winiwarter, W., Höfferer, M. and Knaus, B.: 1997, ‘CIFS – A Cognitive Information Filtering System with Evolutionary Adaptation’. Submitted.
Wresch, W.: 1996, Disconnected: Haves and Have-nots in the Information Age. New Brunswick, NJ: Rutgers University Press.
Wyle, M. and Frei, H.: 1989, ‘Retrieving Highly Dynamic, Widely Distributed Information’. In: N. J. Belkin and C. van Rijsbergen (eds.): Proceedings of the Twelfth Annual International ACMSIGIR Conference on Research and Development in Information Retrieval. pp. 108–115.
Wyle, M.F.: 1995, ‘Effective Dissemination of WAN Information’. Ph.D. thesis, LaSalle University, Mandeville, LA. http://vhdl.org/~wyle/diss/diss.html.
Yan, T.W. and Garcia-Molina, H.: 1994, ‘Distributed Selective Dissemination of Information’. In: Proceedings of the Third International Conference on Parallel and Distributed Information Systems. pp. 89–98. ftp://db.stanford.edu/pub/yan/1994/dsdi.ps.
Yan, T.W. and Garcia-Molina, H.: 1995, ‘SIFT – A Tool for Wide-Area Information Dissemination’. In: Proceedings of the 1995 USENIX Technical Conference. pp. 177–186. ftp://db.stanford.edu/ pub/yan/1994/sift.ps.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Oard, D.W. The State of the Art in Text Filtering. User Modeling and User-Adapted Interaction 7, 141–178 (1997). https://doi.org/10.1023/A:1008287121180
Issue Date:
DOI: https://doi.org/10.1023/A:1008287121180