Skip to main content

Information Filtering and Retrieval from Web Sources

  • Chapter
Filtering the Web to Feed Data Warehouses

Abstract

In the first part of this chapter, we shall briefly present the concepts of information retrieval systems (IRSs) and information filtering systems (IFSs). Then, the key characteristics of business information sources on the Web will be described. Subsequently, the main problems with applying the existing filtering and retrieval techniques to exploit the Internet sources will be highlighted. As a result of the criticism, the new model of information filtering system will be proposed in the last part of this chapter. This very model will be the starting point for later considerations in this book.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Abramowicz W (1984) Ein mathematisches Modell eines IR-Systems zur Verbreitung von Informationen in einem Netz. Institut fü r Informatik, Eidgenössische Technische Hochschule, Zürich, 1984 (in German)

    Google Scholar 

  • Abramowicz W (1990) Information Dissemination to Users with Heterogenous Interests. In: Grabowski J (ed) Computers in Science and Higher Education, Mathematical Research, Vol. 57, Akademie-Verlag, Berlin, Germany, pp 62–71

    Google Scholar 

  • Abramowicz W (2001) Information Filters Supplying Management Information Systems. Proceedings of the Second Southern Conference on Computing, 26–28 Oct. 2000, Hattiesburg, Mississippi,USA

    Google Scholar 

  • Abramowicz W, Ceglarek D (1998) Applying Cluster-Based Connection Structure in the Document Base of the SDI System. WebNet’98 World Conference of the WWW, Internet & Intranet, 7–12 Nov. 1998, Orlando, Florida, USA

    Google Scholar 

  • Abramowicz W, Kalczynski PJ, Wçcel K (2001) Information Filters Supplying Data Warehouses with Benchmarking Information. In: Abramowicz W, Zurada J (eds) Knowledge Discovery for Business Information Systems, Kluwer Academic Publishers, USA, pp 1–28

    Google Scholar 

  • Aggarwal CC, Wolf JL, Wu K, Yu PS (1992) Horting Hatches an Egg: A New Graph-theoretic Approach to Collaborative Filtering. In Proc. of the ACM KDD’99 Conference, San Diego, CA, pp 201–212

    Google Scholar 

  • Allan J (1996) Incremental Relevance Feedback. In Proc. of the 19th ACM SIGIR International Conference on Research and Development in Information Retrieval, Zurich, pp 270–278

    Google Scholar 

  • Allcock S, Plenty A, Webber S, Yeates R (1999) Business Information and the Internet: Use of the Internet as an Information Resource for Small and Medium-sized Enterprises: Final Report. British Library Research and Innovation Report, 136),London,England,1999, business.dis.strath.ac.uk/project/finalfittop

    Google Scholar 

  • Amati G, D’Alosi D, Giannini V, Ubaldini F (1997) A Framework for Filtering News and Managing Distributed Data. Journal of Universal Computer Science, Vol. 3 No 8, pp 10071021

    Google Scholar 

  • Baeza-Yates R, Ribeiro-Neto B (1999) Modern Information Retrieval. Addison-Wesley ACM Press New York, USA

    Google Scholar 

  • Belkin NJ, Croft WB (1992) Information Filtering and Information Retrieval: Two Sides of the Same Coin. Communications of the ACM, 35(12):29–38

    Article  Google Scholar 

  • Bestavros A (2000) The Curse of Zipf’s Law“, http://www.personalization.com/soapbox/contributions/zipfcurse.htm(2000–12–24)

  • Bush W (1945) As We May Think. Atlantic Monthly, USA, Jul. 1945, pp 101–108

    Google Scholar 

  • Ceglarek D (1997) Zastosowanie metod taksonomicznych w systemach selektywnego rozpowszechniania i wyszukiwania informacji ekonomicznej (Cluster Analysis Improving Selectvie Dissemination of Information and Retrieval Systems). Doctoral Dissertation, Department of Computer Science, Faculty of Economics, The Poznan University of Economics, Poznafi, Poland (in Polish)

    Google Scholar 

  • Cohen WW (1996) Learning Rules that Classify E-Mail. AAAI Spring Symposium on Machine Learning in Information Access, Stanford

    Google Scholar 

  • Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6), 1990, pp 391–407

    Article  Google Scholar 

  • Denning PJ (1982) Electronic Junk. Communications of the ACM, 25(3):163–165, 1982

    Article  Google Scholar 

  • Dittrich KR, Domenig R (1999) Towards Exploitation of the Data Universe — Database Technology for Comprehensive Query Services. In: Abramowicz W, Orlowska M (eds) Proc of the 3rd International Conference on Business Information Systems BIS’99, Springer-Verlag London, pp 231–249

    Google Scholar 

  • Dumais ST, Furnas GW, Landauer TK, Deerwester S (1988) Using Latent Semantic Analysis to Improve Information Retrieval. Proceedings of ACM CHI’88 Conference on Human Factors in Computing, New York, pp 281–285

    Google Scholar 

  • Foltz PW (1990) Using Latent Semantic Indexing for Information Filtering. In: Allen RB (ed)Proc. of the Conference on Office Information Systems, Cambridge, MA, pp 40–47

    Chapter  Google Scholar 

  • Goldberg D, Nichols D, Oki BM, Terry D (1992) Using Collaborative Filtering to Weave an Information Tapestry. Communications of the ACM, December 1992

    Google Scholar 

  • Grieves M (1998) The impact of information use on decision making: studies in five sectors: introduction, summary and conclusions. Library management, 19 (2), 1998, pp 78–85

    Article  Google Scholar 

  • Gurrin C, Smeaton AF (2000) A Connectivity Analysis Approach to Increasing Precision in Retrieval from Hyperlinked Documents. NIST Special Publication of the Eight Text Retrieval Conference - TREC 8, USA

    Google Scholar 

  • Hackathorn R (1999) Web Farming for the Data Warehouse. Morgan Kaufman Publishers, San Francisco, USA

    Google Scholar 

  • Hall H (1994) Information strategy and manufacturing industry: case studies in the Scottish textile industry. International Journal of Information Management, 14/1994, pp 281–294

    Article  Google Scholar 

  • Herlocker J, Konstan J, Borchers A, Riedl J (1999) An Algorithmic Framework for Performing Collaborative Filtering. In Proc. of the ACM SIGIR’99, ACM Press

    Google Scholar 

  • Hoyle MA, Lueg C (1997) Open Sesame!: A Look at Personal Assistants. In Proc. of the International Conference on the Practical Application of Intelligent Agents and Multi-Agent Technology, PAAM’97, London, pp 51–60

    Google Scholar 

  • Hull D (1998) The TREC-6 Filtering Track: Description and Analysis. In: Voorhees EM, Harman DK (eds) NIST Special Publication 500–240: The Sixth Text REtrieval Conference (TREC-6), Department of Commerce, National Institute of Standards and Technology, USA

    Google Scholar 

  • Kalczyíiski PJ (2000) HyperSDI zasilajgcy hurtownig danych informacjami benchmarkingowymi (HyperSDI Supplying the Data Warehouse with Benchmarking Information), Master Thesis, Department of Computer Science, Faculty of Economics, The Poznaíñ University of Economics, Poznaí, Poland (in Polish)

    Google Scholar 

  • Koenemann J (1996) Relevance feedback: usage, usability, utility. Ph.D. Dissertation, Department of Psychology, Rutgers University, New Brunswick, NJ

    Google Scholar 

  • Lassila O (1997) Introduction to RDF Metadata, W3C NOTE 1997–11–13, http://www.w3.org/TR/NOTE-rdf-simple-intro

  • Mackay WE, Malone TW, Crowston K, Rao R, Rosenblitt D, Card SK (1989) How Do Experienced Information Lens Users Use Rules?. Proceedings of the ACM CHI’89 Conference on Human Factors in Computing Systems, USA, pp 211–216

    Google Scholar 

  • Mattison R (1999) Web Warehousing and Knowledge Management, McGraw-Hill, USA Nelson T (1965) A file structure for the Complex, the Changing and the Indeterminate. ACM 20th National Conference, USA

    Google Scholar 

  • Orminski EM (1991) Business information needs of science park companies. London: The British Library. (Library and information research report; 81)

    Google Scholar 

  • Palme J (1984) You have 134 unread mail! Do you want to read them now?. In IFIP, pp 175184

    Google Scholar 

  • Pfeifer R, Rademaker P (1991) Situated adaptive design: Toward a methodology for knowledge systems development. In DAICW 1991, pp 53–64

    Google Scholar 

  • Ponte JM (1998) A Language Modeling Approach to Information Retrieval. Doctoral Dissertation, University of Massachusetts, Amhers

    Google Scholar 

  • Reid C (1986) Business information needs in Scotland. Aslib proceedings. 38 (2) Feb. 1986, pp 51–64

    Article  Google Scholar 

  • Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) GroupLens: An Open Architecture for Collaborative Filtering of Netnews. In Proc. of CSCW’94, Chapel Hill, ACM Press, pp 175–186

    Google Scholar 

  • Rijsbergen van CJ (1979) Information Retrieval. Butterworths, London, England, http://www.dcs.gla.ac.uk/Keith/Preface.html

  • Robertson SE, Sparck-Jones K (1976) Relevance Weighting of Search Terms. Journal of the American Society for Information Sciences, 27 Mar 1976, pp 129–146

    Article  Google Scholar 

  • Rocchio JJ (1971) Relevance Feedback in Information Retrieval. In: Salton G (ed) The SMART Retrieval System, Prentice—Hall, Englewood NJ, pp 313–323

    Google Scholar 

  • Salton G (1971) The SMART Retrieval System — Experiments in Automatic Document Processing. Prentice Hall Inc., Englewood Cliffs

    Google Scholar 

  • Salton G, McGill M (1983) Introduction to Modern Information Retrieval. McGraw-Hill Book Company, USA

    MATH  Google Scholar 

  • Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-Based Collaborative Filtering Recommendation Algorithms. In: Proc. of the 10th World-Wide Web Conference, Hong-Kong, ACM. http://wwwl0.org

    Google Scholar 

  • Savia E, Kurki T, Jokela S (1998) Metadata-based Matching of Documents and User Profiles. Proceedings of the 8th Finnish Artificial Intelligence Conference, Human and Artificial Information Processing, Finnish Artificial Intelligence Society, Finland, pp 61–69

    Google Scholar 

  • Shardanand U, Maes P (1995) Social Information Filtering: Algorithms for automating `Word of Mouth’. In Proc. of CHI’95, Denver

    Google Scholar 

  • TREC, Text Retrieval Conferences, 1992–2000,http://trec.nist.gov

  • Wgcel K (2000) Odkrywanie wiedzy dla doskonalenia profili HyperSDI w hurtowniach danych (Knowledge Discovery for Improving HyperSDI Profiles in Data Warehouses). Master Thesis, Department of Computer Science, Faculty of Economics, The Poznañ University of Economics, Poznar, Poland (in Polish)

    Google Scholar 

  • Weibel S, Miller E (1997) Dublin Core Metadata Element Set WWW homepage,purl.org/metadata/dublin_core

    Google Scholar 

  • White B et al. (1982) Information and the small manufacturing firm. Edinburgh: Capital Planning Information

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag London

About this chapter

Cite this chapter

Abramowicz, W., Kalczyński, P., Węcel, K. (2002). Information Filtering and Retrieval from Web Sources. In: Filtering the Web to Feed Data Warehouses. Springer, London. https://doi.org/10.1007/978-1-4471-0137-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-0137-6_4

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-1107-8

  • Online ISBN: 978-1-4471-0137-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics