Advertisement

Building New Field Association Term Candidates Automatically by Search Engine

  • Masao Fuketa
  • El-Sayed Atlam
  • Elmarhomy Ghada
  • Kazuhiro Morita
  • Jun-ichi Aoe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4252)

Abstract

With increasing popularity of the Internet and tremendous amount of on-line text, automatic document classification is important for organizing huge amounts of data. Readers can know the subject of many document fields by reading only some specific Field Association (FA) words. Document fields can be decided efficiently if there are many FA words and if the frequency rate is high. This paper proposes a method for automatically building new FA words. A WWW search engine is used to extract FA word candidates from document corpora. New FA word candidates in each field are automatically compared with previously determined FA words. Then new FA words are appended to an FA word dictionary. From the experiential results, our new system can automatically appended around 44% of new FA words to the existence FA word Dictionary. Moreover, the concentration ratio 0.9 is also effective for extracting relevant FA words that needed for the system design to build FA words automatically.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aoe, J., Morita, K., Mochizuki, H.: An Efficient Retrieval Algorithm of Collocate Information Using Tree Structure. Transaction of The IPSJ 39(9), 2563–2571 (1989)Google Scholar
  2. 2.
    Atlam, E.-S., Elmarhomy, G., Morita, K., Fuketa, M., Aoe, J.: A New Algorithm for Construction Specific Field Terms Using Co-occurrence Words Information. In: 8th International Conference on Knowledge-Based Intelligent Information & Engineering Systems, Wellington, New Zealand, Part 1, pp. 530–540 (2004)Google Scholar
  3. 3.
    Atlam, E.-S., Aoe, J.: A new algorithm for automatic extracting FA word candidates from document corpora. The Interim Report of Tokushima University, 25-27 (2004)Google Scholar
  4. 4.
    Atlam, E.-S., Morita, K., Fuketa, M., Aoe, J.: A New Method for Selecting English Compound Terms and its Knowledge Representation. Information Processing & Management Journal 38(6), 807–821 (2002)zbMATHCrossRefGoogle Scholar
  5. 5.
    Atlam, E.-S., Fuketa, M., Morita, K., Aoe, J.: Document Similarity measurement using Field association terms. Information Processing & Management 39(6), 809–824 (2003)CrossRefGoogle Scholar
  6. 6.
    Callen, J.P.: Passage and level evidence in document retrieval. In: Proc. of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 302–310 (1994)Google Scholar
  7. 7.
    Dozawa, T.: Innovative Multi Information Dictionary Imidas 1999. Annual Series. Zueisha Publication Co, Japan (1999) (In Japanese)Google Scholar
  8. 8.
    Fuhr, N.: Models for retrieval with probabilistic indexing. Information Processing and Retrieval 25(1), 55–72 (1989)MathSciNetGoogle Scholar
  9. 9.
    Fukumoto, F., Suzuki, Y.: Automatic Clustering of Articles using Dictionary definitions. In: Proceeding of the 16th International Conference on Computional Linguistic (COLING 1996), pp. 406–411 (1996)Google Scholar
  10. 10.
    Iwayama, M., Tokunaga, T.: Probabilistic Passage Categorization and Its Application. Journal of Natural language Processing 6(3), 181–198 (1999)Google Scholar
  11. 11.
    Kawabe, K., Matsumoto, Y.: Acquisition of normal lexical knowledge based on basic level category. Information Processing Society of Japan, SIG note NL125-9, 87–92 (1998)Google Scholar
  12. 12.
    Melucii, M.: Passage Retrieval and a Probabilistic technique. Information Processing and Management 34(1), 43–68 (1998)CrossRefGoogle Scholar
  13. 13.
    Ohkubo, M., Sugizaki, M., Inoue, T., Tanaka, K.: Extracting Information Demand by Analyzing a WWW Search Login. Trans. of Information Processing Society of Japan 39(7), 2250–2258 (1998)Google Scholar
  14. 14.
    Salton, G., McGill, M.J.: Introduction of Modern Information Retrieval. McGraw-Hill, New York (1983)Google Scholar
  15. 15.
    Tsuji, T., Fuketa, M., Morita, K., Aoe, J.: An Efficient Method of Determining FA Terms of Compound Words. Journal of Natural Language Processing 7(2), 3–26 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Masao Fuketa
    • 1
  • El-Sayed Atlam
    • 1
  • Elmarhomy Ghada
    • 1
  • Kazuhiro Morita
    • 1
  • Jun-ichi Aoe
    • 1
  1. 1.Department of Information Science and Intelligent SystemsUniversity of TokushimaTokushimaJapan

Personalised recommendations