Abstract
Most search engines return a lot of unwanted information. A more thorough filtering process can be performed on this information to sort out the relevant documents. A new method called Frequency Domain Scoring (FDS), which is based on the Fourier Transform is proposed. FDS performs the filtering by examining the locality of the keywords throughout the documents. This is examined and compared to the well known techniques Latent Semantic Indexing (LSI) and Cosine measure. We found that FDS obtains better results of how relevant the document is to the query. The other two methods (cosine measure, LSI) do not perform as well mainly because they need a wider variety of documents to determine the topic.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
M. W. Berry, S. T. Dumais, and G. W. O’Brien. Using linear algebra for intelligent information retrieval. Technical report, Computer Science Department, The University of Tennessee, Knoxville, TN, December 1994.
Michael W. Berry. Large scale sparse singular value computations. Department of Computer Science, University of Tennessee, 107 Ayres Hall, Knoxville, 1993.
S. Jeromy Carrière and Rick Kazman. Webquery: searching and visualising the web through connectivity. Computer Networks and ISDN Systems, 29:1257–1267, 1997.
Susan T. Dumais. Improving the retrieval of information from external sources. Behaviour Research Methods, Instruments & Computers, 23(2):229–236, 1991.
Adele E. Howe and Daniel Dreilinger. Savvysearch. AI Magazine, pages 19–25, Summer 1997.
Dunja Mladenić. Personal webwatcher: design and implementation. Technical report, Dept. for Intelligent Systems, J. Stefan Institute, Jamova 39, 11000 Ljubljana, Slovania, 1996.
Daniel Siaw Weng Ngu and Xindong Wu. Site helper: a localised agent that helps incremental exploration of the world wide web. Computer Networks and ISDN Systems, 29:1249–1255, 1997.
John G. Proakis and Dimitris G. Manolakis. Digital signal processing: principles, algorithms, and applications. Prentice-Hall, Inc, 3rd edition, 1996.
Yousef Saad. Iterative methods for sparse linear systems. PWS series in computer science. PWS Pub. Co., Boston, 1996.
Ellen Spertus. Parasite: mining structural information on the web. Computer Networks and ISDN Systems, 29:1205–1215, 1997.
National Institute Of Standards and Technology. Text retrieval conference (trec) http://trec.nist.gov/. World Wide Web, 2001.
Ian H. Witten, Alistair Moffat, and Timothy C. Bell. Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann Publishers, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Park, L.A.F., Palaniswami, M., Kotagiri, R. (2001). Internet Document Filtering Using Fourier Domain Scoring. In: De Raedt, L., Siebes, A. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 2001. Lecture Notes in Computer Science(), vol 2168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44794-6_30
Download citation
DOI: https://doi.org/10.1007/3-540-44794-6_30
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42534-2
Online ISBN: 978-3-540-44794-8
eBook Packages: Springer Book Archive