Advertisement

Information Filtering and Retrieval from Web Sources

  • Witold Abramowicz
  • Paweł Kalczyński
  • Krzysztof Węcel

Abstract

In the first part of this chapter, we shall briefly present the concepts of information retrieval systems (IRSs) and information filtering systems (IFSs). Then, the key characteristics of business information sources on the Web will be described. Subsequently, the main problems with applying the existing filtering and retrieval techniques to exploit the Internet sources will be highlighted. As a result of the criticism, the new model of information filtering system will be proposed in the last part of this chapter. This very model will be the starting point for later considerations in this book.

Keywords

Information Retrieval Relevance Feedback Collaborative Filter Content Provider Vector Space Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abramowicz W (1984) Ein mathematisches Modell eines IR-Systems zur Verbreitung von Informationen in einem Netz. Institut fü r Informatik, Eidgenössische Technische Hochschule, Zürich, 1984 (in German)Google Scholar
  2. Abramowicz W (1990) Information Dissemination to Users with Heterogenous Interests. In: Grabowski J (ed) Computers in Science and Higher Education, Mathematical Research, Vol. 57, Akademie-Verlag, Berlin, Germany, pp 62–71Google Scholar
  3. Abramowicz W (2001) Information Filters Supplying Management Information Systems. Proceedings of the Second Southern Conference on Computing, 26–28 Oct. 2000, Hattiesburg, Mississippi,USAGoogle Scholar
  4. Abramowicz W, Ceglarek D (1998) Applying Cluster-Based Connection Structure in the Document Base of the SDI System. WebNet’98 World Conference of the WWW, Internet & Intranet, 7–12 Nov. 1998, Orlando, Florida, USAGoogle Scholar
  5. Abramowicz W, Kalczynski PJ, Wçcel K (2001) Information Filters Supplying Data Warehouses with Benchmarking Information. In: Abramowicz W, Zurada J (eds) Knowledge Discovery for Business Information Systems, Kluwer Academic Publishers, USA, pp 1–28Google Scholar
  6. Aggarwal CC, Wolf JL, Wu K, Yu PS (1992) Horting Hatches an Egg: A New Graph-theoretic Approach to Collaborative Filtering. In Proc. of the ACM KDD’99 Conference, San Diego, CA, pp 201–212Google Scholar
  7. Allan J (1996) Incremental Relevance Feedback. In Proc. of the 19th ACM SIGIR International Conference on Research and Development in Information Retrieval, Zurich, pp 270–278Google Scholar
  8. Allcock S, Plenty A, Webber S, Yeates R (1999) Business Information and the Internet: Use of the Internet as an Information Resource for Small and Medium-sized Enterprises: Final Report. British Library Research and Innovation Report, 136),London,England,1999, business.dis.strath.ac.uk/project/finalfittopGoogle Scholar
  9. Amati G, D’Alosi D, Giannini V, Ubaldini F (1997) A Framework for Filtering News and Managing Distributed Data. Journal of Universal Computer Science, Vol. 3 No 8, pp 10071021Google Scholar
  10. Baeza-Yates R, Ribeiro-Neto B (1999) Modern Information Retrieval. Addison-Wesley ACM Press New York, USAGoogle Scholar
  11. Belkin NJ, Croft WB (1992) Information Filtering and Information Retrieval: Two Sides of the Same Coin. Communications of the ACM, 35(12):29–38CrossRefGoogle Scholar
  12. Bestavros A (2000) The Curse of Zipf’s Law“, http://www.personalization.com/soapbox/contributions/zipfcurse.htm(2000–12–24)
  13. Bush W (1945) As We May Think. Atlantic Monthly, USA, Jul. 1945, pp 101–108Google Scholar
  14. Ceglarek D (1997) Zastosowanie metod taksonomicznych w systemach selektywnego rozpowszechniania i wyszukiwania informacji ekonomicznej (Cluster Analysis Improving Selectvie Dissemination of Information and Retrieval Systems). Doctoral Dissertation, Department of Computer Science, Faculty of Economics, The Poznan University of Economics, Poznafi, Poland (in Polish)Google Scholar
  15. Cohen WW (1996) Learning Rules that Classify E-Mail. AAAI Spring Symposium on Machine Learning in Information Access, StanfordGoogle Scholar
  16. Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6), 1990, pp 391–407CrossRefGoogle Scholar
  17. Denning PJ (1982) Electronic Junk. Communications of the ACM, 25(3):163–165, 1982CrossRefGoogle Scholar
  18. Dittrich KR, Domenig R (1999) Towards Exploitation of the Data Universe — Database Technology for Comprehensive Query Services. In: Abramowicz W, Orlowska M (eds) Proc of the 3rd International Conference on Business Information Systems BIS’99, Springer-Verlag London, pp 231–249Google Scholar
  19. Dumais ST, Furnas GW, Landauer TK, Deerwester S (1988) Using Latent Semantic Analysis to Improve Information Retrieval. Proceedings of ACM CHI’88 Conference on Human Factors in Computing, New York, pp 281–285Google Scholar
  20. Foltz PW (1990) Using Latent Semantic Indexing for Information Filtering. In: Allen RB (ed)Proc. of the Conference on Office Information Systems, Cambridge, MA, pp 40–47CrossRefGoogle Scholar
  21. Goldberg D, Nichols D, Oki BM, Terry D (1992) Using Collaborative Filtering to Weave an Information Tapestry. Communications of the ACM, December 1992Google Scholar
  22. Grieves M (1998) The impact of information use on decision making: studies in five sectors: introduction, summary and conclusions. Library management, 19 (2), 1998, pp 78–85CrossRefGoogle Scholar
  23. Gurrin C, Smeaton AF (2000) A Connectivity Analysis Approach to Increasing Precision in Retrieval from Hyperlinked Documents. NIST Special Publication of the Eight Text Retrieval Conference - TREC 8, USAGoogle Scholar
  24. Hackathorn R (1999) Web Farming for the Data Warehouse. Morgan Kaufman Publishers, San Francisco, USAGoogle Scholar
  25. Hall H (1994) Information strategy and manufacturing industry: case studies in the Scottish textile industry. International Journal of Information Management, 14/1994, pp 281–294CrossRefGoogle Scholar
  26. Herlocker J, Konstan J, Borchers A, Riedl J (1999) An Algorithmic Framework for Performing Collaborative Filtering. In Proc. of the ACM SIGIR’99, ACM PressGoogle Scholar
  27. Hoyle MA, Lueg C (1997) Open Sesame!: A Look at Personal Assistants. In Proc. of the International Conference on the Practical Application of Intelligent Agents and Multi-Agent Technology, PAAM’97, London, pp 51–60Google Scholar
  28. Hull D (1998) The TREC-6 Filtering Track: Description and Analysis. In: Voorhees EM, Harman DK (eds) NIST Special Publication 500–240: The Sixth Text REtrieval Conference (TREC-6), Department of Commerce, National Institute of Standards and Technology, USAGoogle Scholar
  29. Kalczyíiski PJ (2000) HyperSDI zasilajgcy hurtownig danych informacjami benchmarkingowymi (HyperSDI Supplying the Data Warehouse with Benchmarking Information), Master Thesis, Department of Computer Science, Faculty of Economics, The Poznaíñ University of Economics, Poznaí, Poland (in Polish)Google Scholar
  30. Koenemann J (1996) Relevance feedback: usage, usability, utility. Ph.D. Dissertation, Department of Psychology, Rutgers University, New Brunswick, NJGoogle Scholar
  31. Lassila O (1997) Introduction to RDF Metadata, W3C NOTE 1997–11–13, http://www.w3.org/TR/NOTE-rdf-simple-intro
  32. Mackay WE, Malone TW, Crowston K, Rao R, Rosenblitt D, Card SK (1989) How Do Experienced Information Lens Users Use Rules?. Proceedings of the ACM CHI’89 Conference on Human Factors in Computing Systems, USA, pp 211–216Google Scholar
  33. Mattison R (1999) Web Warehousing and Knowledge Management, McGraw-Hill, USA Nelson T (1965) A file structure for the Complex, the Changing and the Indeterminate. ACM 20th National Conference, USAGoogle Scholar
  34. Orminski EM (1991) Business information needs of science park companies. London: The British Library. (Library and information research report; 81)Google Scholar
  35. Palme J (1984) You have 134 unread mail! Do you want to read them now?. In IFIP, pp 175184Google Scholar
  36. Pfeifer R, Rademaker P (1991) Situated adaptive design: Toward a methodology for knowledge systems development. In DAICW 1991, pp 53–64Google Scholar
  37. Ponte JM (1998) A Language Modeling Approach to Information Retrieval. Doctoral Dissertation, University of Massachusetts, AmhersGoogle Scholar
  38. Reid C (1986) Business information needs in Scotland. Aslib proceedings. 38 (2) Feb. 1986, pp 51–64CrossRefGoogle Scholar
  39. Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) GroupLens: An Open Architecture for Collaborative Filtering of Netnews. In Proc. of CSCW’94, Chapel Hill, ACM Press, pp 175–186Google Scholar
  40. Rijsbergen van CJ (1979) Information Retrieval. Butterworths, London, England, http://www.dcs.gla.ac.uk/Keith/Preface.html
  41. Robertson SE, Sparck-Jones K (1976) Relevance Weighting of Search Terms. Journal of the American Society for Information Sciences, 27 Mar 1976, pp 129–146CrossRefGoogle Scholar
  42. Rocchio JJ (1971) Relevance Feedback in Information Retrieval. In: Salton G (ed) The SMART Retrieval System, Prentice—Hall, Englewood NJ, pp 313–323Google Scholar
  43. Salton G (1971) The SMART Retrieval System — Experiments in Automatic Document Processing. Prentice Hall Inc., Englewood CliffsGoogle Scholar
  44. Salton G, McGill M (1983) Introduction to Modern Information Retrieval. McGraw-Hill Book Company, USAzbMATHGoogle Scholar
  45. Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-Based Collaborative Filtering Recommendation Algorithms. In: Proc. of the 10th World-Wide Web Conference, Hong-Kong, ACM. http://wwwl0.org Google Scholar
  46. Savia E, Kurki T, Jokela S (1998) Metadata-based Matching of Documents and User Profiles. Proceedings of the 8th Finnish Artificial Intelligence Conference, Human and Artificial Information Processing, Finnish Artificial Intelligence Society, Finland, pp 61–69Google Scholar
  47. Shardanand U, Maes P (1995) Social Information Filtering: Algorithms for automating `Word of Mouth’. In Proc. of CHI’95, DenverGoogle Scholar
  48. TREC, Text Retrieval Conferences, 1992–2000,http://trec.nist.gov
  49. Wgcel K (2000) Odkrywanie wiedzy dla doskonalenia profili HyperSDI w hurtowniach danych (Knowledge Discovery for Improving HyperSDI Profiles in Data Warehouses). Master Thesis, Department of Computer Science, Faculty of Economics, The Poznañ University of Economics, Poznar, Poland (in Polish)Google Scholar
  50. Weibel S, Miller E (1997) Dublin Core Metadata Element Set WWW homepage,purl.org/metadata/dublin_coreGoogle Scholar
  51. White B et al. (1982) Information and the small manufacturing firm. Edinburgh: Capital Planning InformationGoogle Scholar

Copyright information

© Springer-Verlag London 2002

Authors and Affiliations

  • Witold Abramowicz
    • 1
  • Paweł Kalczyński
    • 1
  • Krzysztof Węcel
    • 1
  1. 1.Department of Computer ScienceThe Poznań University of EconomicsPoznańPoland

Personalised recommendations