Advertisement

A New Approach for Improving Field Association Term Dictionary Using Passage Retrieval

  • Kazuhiro Morita
  • El-Sayed Atlam
  • Elmarhomy Ghada
  • Masao Fuketa
  • Jun-ichi Aoe
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4252)

Abstract

Large collections of full-text document are now commonly used in automated information retrieval Readers generally identify the subject of a text when they notice specific terms, calledField Association (FA) terms, in that text. Previous researches showed that evidence from passage can improve retrieval results by dividing documents into coherent units with each unit corresponding to a subtopic. Moreover, many current researchers are extracting FA terms candidates from the whole documents to build FA term dictionary automatically. This paper proposes a method for automatically building new FA term dictionary from documents after using passage retrieval. A WWW search engine is used to extract FA terms candidates from passage document corpora. Then, new FA terms candidates in each field are automatically compared with previously determined FA terms dictionary. Finally, new FA terms from extracted term candidates are appended automatically to the existence FA terms dictionary. From experimental results the new technique using passage documents can automatically append about 15% of FA terms from terms candidates to the existence FA term dictionary over the old method. Moreover, Recall and Precision significantly improved by 20% and 32% over the traditional method. The proposed methods are applied to 38,372 articles from the large tagged corpus.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aoe, J., Morita, K., Mochizuki, H.: An Efficient Retrieval Algorithm of Collocate Information Using Tree Structure. Transaction of the IPSJ 39(9), 2563–2571 (1989)Google Scholar
  2. 2.
    Atlam, E.-S., Morita, K., Fuketa, M., Aoe, J.: A New Method for Selecting English Compound Terms and its Knowledge Representation. Information Processing & Management Journal 38(6), 807–821 (2002)zbMATHCrossRefGoogle Scholar
  3. 3.
    Atlam, E.-S., Fuketa, M., Morita, K., Aoe, J.: Document Similarity measurement using Field association terms. Information Processing & Management Journal 39(6), 809–824 (2003)CrossRefGoogle Scholar
  4. 4.
    Atlam, E.-S., Elmarhomy, G., Fuketa, M., Morita, K., Aoe, J.: Automatic Building of New Field Association Word Candidates Using Search Engine. Information Processing & Management Journal 42(4), 951–962 (2006)CrossRefGoogle Scholar
  5. 5.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman and Hall, Boca Raton (1984)zbMATHGoogle Scholar
  6. 6.
    Callen, J.P.: Passage and level evidence in document retrieval. In: Proc. of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 302–310 (1994)Google Scholar
  7. 7.
    Dozawa, T.: Innovative Multi Information Dictionary Imidas 1999. Annual Series. Zueisha Publication Co., Japan (1999) (In Japanese)Google Scholar
  8. 8.
    Fuhr, N.: Models for retrieval with probabilistic indexing. Information Processing and Retrieval 25(1), 55–72 (1989)MathSciNetGoogle Scholar
  9. 9.
    Fukumoto, F., Suzuki, Y.: Automatic Clustering of Articles using Dictionary definitions. In: Proceeding of the 16th International Conference on Computional Linguistic (COLING 1996), pp. 406–411 (1996)Google Scholar
  10. 10.
    Hearst, M.A., Plaunt, C.: Subtopic structuring for full-length document access. In: Korfhage, R., Rasmussen, E., Willet, P. (eds.) Proceedings of the 16th annual international ACM-SIGIR conference on research and development in information retrieval, pp. 59–68. ACM, New York (1993)CrossRefGoogle Scholar
  11. 11.
    Hearst, M.A.: TextTiling, a quantitative approach to discourse segmentation. Technical Report 93/24 Sequoia 2000 Technical Report, University of California, Berkeley (2000)Google Scholar
  12. 12.
    Iwayama, M., Tokunaga, T.: Probabilistic Passage Categorization and Its Application. Journal of Natural language Processing 6(3), 181–198 (1999)Google Scholar
  13. 13.
    Jiang, J., Zhai, C.X.: UIUC in HARD 2004-Passage Retrieval Using HMMs, University of Illinois at Urbana-Champaign. TREC 2004 (2004)Google Scholar
  14. 14.
    Jones, K.S.: Automatic summarizing: factors and directions, Computer Laboratory, University of Cambridge (1998)Google Scholar
  15. 15.
    Kaszkiel, M., Zobel, J.: Passage retrieval revised. In: Proc. of the 20th Annual International ACM SIGIR Conference on Research and Development in information Retrieval, pp. 178–185 (1997)Google Scholar
  16. 16.
    Kawabe, K., Matsumoto, Y.: Acquisition of normal lexical knowledge based on basic level category. Information Processing Society of Japan, SIG note NL125-9, 87–92 (1998) (in Japanese)Google Scholar
  17. 17.
    Melucii, M.: Passage Retrieval and a Probabilistic technique. Information Processing and Management 34(1), 43–68 (1998)CrossRefGoogle Scholar
  18. 18.
    Ohkubo, M., Sugizaki, M., Inoue, T., Tanaka, K.: Extracting Information Demand by Analyzing a WWW Search Login. Trans. of Information Processing Society of Japan 39(7), 2250–2258 (1998)Google Scholar
  19. 19.
    Salton, G., McGill, M.J.: Introduction of Modern Information Retrieval. McGraw-Hill, New York (1983)Google Scholar
  20. 20.
    Salton, G., Allan, J., Singhal, A.K.: Automatic text decomposition and structuring. Information Processing and Management 32(2), 127–138 (1996)CrossRefGoogle Scholar
  21. 21.
    Salton, G.: Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)Google Scholar
  22. 22.
    Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: Proceedings of the 16th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, pp. 49–58 (1993)Google Scholar
  23. 23.
    Tsuji, T., Nigazawa, H., Okada, M., Aoe, J.: Early Field Recognition by Using Field Association Words. In: The Proceeding of the 18th International Conference on Computer Processing of Oriental Language, vol. 2, pp. 301–304 (1999)Google Scholar
  24. 24.
    Tsuji, T., Fuketa, M., Morita, K., Aoe, J.: An Efficient Method of Determining Field Association Terms of Compound Words. Journal of Natural Language Processing 7(2), 3–26 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Kazuhiro Morita
    • 1
  • El-Sayed Atlam
    • 1
  • Elmarhomy Ghada
    • 1
  • Masao Fuketa
    • 1
  • Jun-ichi Aoe
    • 1
  1. 1.Department of Information Science and Intelligent SystemsUniversity of TokushimaTokushimaJapan

Personalised recommendations