Advertisement

Journal of Intelligent Information Systems

, Volume 24, Issue 1, pp 61–85 | Cite as

A New Term Significance Weighting Approach

  • Jin ZhangEmail author
  • Tien N. Nguyen
Article

Abstract

The authors present a new term significance measure that integrates term frequency retrieval characteristics, term frequency, document collection characteristics, and both the term depth and width distribution characteristics. A new concept, the term depth distribution, is introduced and its impact on the term significance is analyzed. The authors address the features of the new term significance measure from the angles of the impact of the variables (parameters) on it and the iso-significance contour analyses. An experimental study was conducted to compare the newly developed approach with two other popular approaches from the perspectives of both efficiency and effectiveness. The results show that the newly developed approach achieves satisfactory performance. Issues for further research on this topic are suggested.

Keywords

term significance automatic term weighting term weighting evaluation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, J.D. and Perez Carballo, J. (2001a). The Nature of Indexing: How Humans and Machines Analyze Messages and Texts for Retrieval. Part I: Research, and the Nature of Human Indexing. Information Processing and Management, 37(2), 231–254.Google Scholar
  2. Anderson, J.D. and Perez Carballo, J. (2001b). The Nature of Indexing: How Humans and Machines Analyze Messages and Texts for Retrieval. Part II: Machine Indexing, and the Allocation of Human Versus Machine Effort. InformationProcessing and Management, 37(2), 255–277.Google Scholar
  3. Atlam, E.S., Fuketa, M., and Morita, K. (2000). Similarity Measurement Using Term Negative Weight and Its Application to Word Similarity. Information Processing and Management, 36(5), 717–736.Google Scholar
  4. Boger, Z., Kuflik, T., and Shoval, P. (2001). Automatic Keyword Identification by Artificial Neural Networks Compared to Manual Identification by Users of Filtering Systems. Information Processing and Management, 37(2), 187–198.Google Scholar
  5. Debole, F. and Sebastiani, F. (2003). Information Access and Retrieval: Supervised Term Weighting for Automated Text Categorization. In Proceedings of the 2003 ACM Symposium on Applied Computing (pp. 784–788). Melbourne, Florida: ACM.Google Scholar
  6. Gordon, M.D. and Dumais, S. (1998). Using Latent Semantic Indexing for Literature Based Discovery. Journal of the American Society for Information Science, 49(8), 674–685.Google Scholar
  7. Greiff, W.R. (1998). A Theory of Term Weighting Based on Exploratory Data Analysis. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 11–19). Melbourne, Australia: ACM.Google Scholar
  8. Greiff, W.R., Morgan, W.T., and Ponte, J.M. (2002). Information Retrieval Models: The Role of Variance in Term Weighting for Probabilistic Information Retrieval. In Proceedings of the Eleventh International Conference on Information and Knowledge Management (pp. 252–259). New York, NY: ACM.Google Scholar
  9. Jin, R., Falusos, C., and Hauptmann, A.G. (2001). Meta-Scoring: Automatically Evaluating Term Weighting Schemes in IR Without Precision-Recall. In Proceedings of the 24th Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval (pp.83–89). New Orleans, Louisiana: ACM.Google Scholar
  10. John, W.W. (2001). Global Term Weights for Document Retrieval Learned from TREC Data. Journal of Information Science, 27(5), 303–310.Google Scholar
  11. Keen, E.M. (1991). The Use of Term Position Devices in Ranked Output Experiments. Journal of Documentation, 47, 1–22.Google Scholar
  12. Korfhage, R. (1997). Information Storage and Retrieval. New York: Wiley Computer Pub.Google Scholar
  13. Lai, Y.S. and Wu, C.H. (2002). Meaningful Term Extraction and Discriminative Term Selection in Text Categorization via Unknown-Word Methodology. ACM Transactions on Asian Language Information Processing (TALIP), 1(1), 34–64.Google Scholar
  14. Luhn, H.P. (1957). A Statistical Approach to the Mechanized Encoding andSearching of Literary Information. IBM Journal of Research and Development, 1(4), 309–317.Google Scholar
  15. Luhn, H.P. (1958). The Automatic Creation of Literature Abstract. IBM Journal of Research and Development, 2(4), 159–165.Google Scholar
  16. Meadow, C.T. (1992). Text Information Retrieval System. California: San Diego Academic Press.Google Scholar
  17. Melucci, M. (1998). Passage Retrieval: A Probabilistic Technique. Information Processing & Management, 34(1), 43–68.Google Scholar
  18. Ponte, J.M. and Croft, W.B. (1998). A Language Modeling Approach to Information Retrieval. In Proceedings of 21st Annual International SIGIR Conference on Research and Development in Information Retrieval (pp. 275–281). Melbourne, Australia: ACM.Google Scholar
  19. Rasmussen, E. (1992). Clustering Algorithms. In W.B. Frakes and R. Baeza-Yates (Eds.), Information Retrieval: Data Structures and Algorithms, Prentice Hall. (pp. 419–442). Englewood Cliffs, NJGoogle Scholar
  20. Ro, J.S. (1988). An Evaluation of the Applicability of Ranking Algorithms to Improve the Effectiveness of Full-Text Retrieval. II. On the Effectiveness of Ranking Algorithms on Full-Text Retrieval. Journal of the American Society for Information Science, 39(3), 147–160.Google Scholar
  21. Robertson, A.M. and Willett, P. (1996). An Upperbound to the Performanceof Ranked-OutputSearching: Optimal Weighting of Query Terms Using a Genetic Algorithm. Journal of Documentation, 52, 405–420.Google Scholar
  22. Robertson, S.E., Thompson, C.L., and Macaskill, M.J. (1986). Weighting, Ranking and Relevance Feedback in a Front-end System. Journal of Information Science, 12(2), 71–75.Google Scholar
  23. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M.M., and Gatford, M. (1994). Okapi at TREC-2. In Proceedings of The Second Text Retrieval Conference (pp. 21–34). Gaithersburgh, MD: GPO.Google Scholar
  24. Salton, G. (1989). Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. New York: Addison-Wesley.Google Scholar
  25. Salton, G., Allan, J., and Singhal, A. (1996). Information Processing and Management, 32(2), 127–138.Google Scholar
  26. Salton, G. and Buckley, C. (1988). Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management, 24(5), 513–523.Google Scholar
  27. Salton, G. and Yang, C.S. (1973). On the Specification of Term Values in Automatic Indexing. Journal of Documentation, 29(4), 351–372.Google Scholar
  28. Sparck Jones, K. (1972). A Statistical Interpretation of Term Specificity and Its Application in Information Retrieval. Journal of Documentation, 28, 11–21.Google Scholar
  29. Sparck Jones, K. (1973). Indexing Term Weighting. Information Storage and Retrieval, 9, 619–633.Google Scholar
  30. Umino, B. (1988). Some Principles of Weighting Methods Based on Word Frequencies for Automatic Indexing. Library and Information Science, 26, 67–88.Google Scholar
  31. van Rijsbergen, C.J. (1977). A Theoretical Basis for the Use of Co-occurrence Data in Information Retrieval. Journal of Documentation, 33(2), 106–119.Google Scholar
  32. van Rijsbergen, C.J. (1979). Information Retrieval, 2nd ed. London: Butterworths.Google Scholar
  33. Wilbur, W.J. (1993). Retrieval Testing with Hypergeometric Document Models: Global Term Weighting Approach. Journal of the American Society for Information Science, 44, 340–351.Google Scholar
  34. Zobel, J. and Moffat, A. (1998). Exploring the Similarity Space. ACM SIGIR Forum, 32(1), 18–34.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  1. 1.University of Wisconsin-MilwaukeeMilwaukee
  2. 2.University of Wisconsin-MilwaukeeMilwaukee

Personalised recommendations