A new indexing method based on word proximity for Chinese text retrieval

Du, Lin; Sun, Yufang

doi:10.1007/BF02948815

A new indexing method based on word proximity for Chinese text retrieval

Published: May 2000

Volume 15, pages 280–286, (2000)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Du Lin¹ &
Sun Yufang¹

51 Accesses
2 Citations
Explore all metrics

Abstract

This paper proposed a novel text representation and matching scheme for Chinese text retrieval. At present, the indexing methods of Chinese retrieval systems are either character-based or word-based. The character-based indexing methods, such as bi-gram or tri-gram indexing, have high false drops due to the mismatches between queries and documents. On the other hand, it’s difficult to efficiently identify all the proper nouns, terminology of different domains, and phrases in the word-based indexing systems. The new indexing method uses both proximity and mutual information of the word pairs to represent the text coutent so as to overcome the high false drop, new word and phrase problems that exist in the character-based and word-based systems. The evaluation results indicate that the average query precision of proximity-based indexing is 5.2% higher than the best results of TREC-5.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel feature and class-based globalization technique for text classification

Article 25 April 2023

A comprehensive and analytical review of text clustering techniques

Article 08 April 2024

Distance Weighted Cosine Similarity Measure for Text Classification

References

Salton G. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
Salton G. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, 1989.
Chien Lee-Feng. Fast and quasi-natural language search for gigabytes of Chinese texts. InACM SIGIR’95, Seattle, 1995, pp.112–120.
Wilkinson R. Chinese document retrieval at TREC-6. InText Retrieval Conference (TREC-6) NIST, Gaithersburg, Maryland, 1997, pp.25–30.
Du L, Sun Y F. The application of NLP in the chinese information retrieval. InSCIPL’98, Hong Kong, 1998, pp.32–38.
Leong M K, Zhou H. Preliminary qualitative analysis of segmented vs bigram indexing in Chinese. InText Retrieval Conference (TREC-6), NIST, Gaithersburg, Maryland, 1997, pp.551–558.
He J, Xu J. Berkeley Chinese information retrieval at TREC-5: Technical report. InText Retrieval Conference (TREC-5), NIST, Gaithersburg, Maryland, 1996, pp.191–196.
Wu Li-deet al. Fudan abstract system of Chinese text.Communications of COLIPS, 1996, 6(1): 35–39.
Google Scholar
Sun M, Huang C. Identifying Chinese names in unrestricted texts.Communications of COLIPS, 1994, 4(2): 113–122.
MathSciNet Google Scholar
Liu K Y. The evaluation of the modern Chinese word segmentation.Applied Linguistics, 1997, 21(1): 101–106.
Google Scholar
Liu Y. Modern Chinese Word Segmentation Specification and Methodology for Information Processing. Tsinghua University Press, 1994.

Download references

Author information

Authors and Affiliations

Open System & Chinese Information Processing Center, Institute of Software, Chinese Academy of Sciences, 100080, Beijing, P.R. China
Du Lin & Sun Yufang

Authors

Du Lin
View author publications
You can also search for this author in PubMed Google Scholar
Sun Yufang
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

This work was supported by the National ‘863’ High-Tech Programme of China under Grant No.863-306-ZD-10-21, and the National Natural Science Foundation of China under Grant No.69983009.

DU Lin was born in 1965. He received the B.S. degree from Chongqing University in 1990 and the Ph.D. degree in computer science from the Institute of Software, Chinese Academy of Sciences, in 1999. Since 1995, he has been working on Chinese information retrieval.

SUN Yufang was born in 1947. He received the M.S. degree from the Institute of Softeware, CAS in 1983. Since 1985, he has been working on Chinese information processing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Du, L., Sun, Y. A new indexing method based on word proximity for Chinese text retrieval. J. Comput. Sci. & Technol. 15, 280–286 (2000). https://doi.org/10.1007/BF02948815

Download citation

Received: 14 December 1998
Revised: 10 July 1999
Issue Date: May 2000
DOI: https://doi.org/10.1007/BF02948815

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new indexing method based on word proximity for Chinese text retrieval

Abstract

Access this article

Similar content being viewed by others

A novel feature and class-based globalization technique for text classification

A comprehensive and analytical review of text clustering techniques

Distance Weighted Cosine Similarity Measure for Text Classification

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new indexing method based on word proximity for Chinese text retrieval

Abstract

Access this article

Similar content being viewed by others

A novel feature and class-based globalization technique for text classification

A comprehensive and analytical review of text clustering techniques

Distance Weighted Cosine Similarity Measure for Text Classification

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation