A New Approach for Improving Field Association Term Dictionary Using Passage Retrieval

Morita, Kazuhiro; Atlam, El-Sayed; Ghada, Elmarhomy; Fuketa, Masao; Aoe, Jun-ichi

doi:10.1007/11893004_39

Kazuhiro Morita²¹,
El-Sayed Atlam²¹,
Elmarhomy Ghada²¹,
Masao Fuketa²¹ &
…
Jun-ichi Aoe²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4252))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

2874 Accesses

Abstract

Large collections of full-text document are now commonly used in automated information retrieval Readers generally identify the subject of a text when they notice specific terms, calledField Association (FA) terms, in that text. Previous researches showed that evidence from passage can improve retrieval results by dividing documents into coherent units with each unit corresponding to a subtopic. Moreover, many current researchers are extracting FA terms candidates from the whole documents to build FA term dictionary automatically. This paper proposes a method for automatically building new FA term dictionary from documents after using passage retrieval. A WWW search engine is used to extract FA terms candidates from passage document corpora. Then, new FA terms candidates in each field are automatically compared with previously determined FA terms dictionary. Finally, new FA terms from extracted term candidates are appended automatically to the existence FA terms dictionary. From experimental results the new technique using passage documents can automatically append about 15% of FA terms from terms candidates to the existence FA term dictionary over the old method. Moreover, Recall and Precision significantly improved by 20% and 32% over the traditional method. The proposed methods are applied to 38,372 articles from the large tagged corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aoe, J., Morita, K., Mochizuki, H.: An Efficient Retrieval Algorithm of Collocate Information Using Tree Structure. Transaction of the IPSJ 39(9), 2563–2571 (1989)
Google Scholar
Atlam, E.-S., Morita, K., Fuketa, M., Aoe, J.: A New Method for Selecting English Compound Terms and its Knowledge Representation. Information Processing & Management Journal 38(6), 807–821 (2002)
Article MATH Google Scholar
Atlam, E.-S., Fuketa, M., Morita, K., Aoe, J.: Document Similarity measurement using Field association terms. Information Processing & Management Journal 39(6), 809–824 (2003)
Article Google Scholar
Atlam, E.-S., Elmarhomy, G., Fuketa, M., Morita, K., Aoe, J.: Automatic Building of New Field Association Word Candidates Using Search Engine. Information Processing & Management Journal 42(4), 951–962 (2006)
Article Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman and Hall, Boca Raton (1984)
MATH Google Scholar
Callen, J.P.: Passage and level evidence in document retrieval. In: Proc. of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 302–310 (1994)
Google Scholar
Dozawa, T.: Innovative Multi Information Dictionary Imidas 1999. Annual Series. Zueisha Publication Co., Japan (1999) (In Japanese)
Google Scholar
Fuhr, N.: Models for retrieval with probabilistic indexing. Information Processing and Retrieval 25(1), 55–72 (1989)
MathSciNet Google Scholar
Fukumoto, F., Suzuki, Y.: Automatic Clustering of Articles using Dictionary definitions. In: Proceeding of the 16th International Conference on Computional Linguistic (COLING 1996), pp. 406–411 (1996)
Google Scholar
Hearst, M.A., Plaunt, C.: Subtopic structuring for full-length document access. In: Korfhage, R., Rasmussen, E., Willet, P. (eds.) Proceedings of the 16th annual international ACM-SIGIR conference on research and development in information retrieval, pp. 59–68. ACM, New York (1993)
Chapter Google Scholar
Hearst, M.A.: TextTiling, a quantitative approach to discourse segmentation. Technical Report 93/24 Sequoia 2000 Technical Report, University of California, Berkeley (2000)
Google Scholar
Iwayama, M., Tokunaga, T.: Probabilistic Passage Categorization and Its Application. Journal of Natural language Processing 6(3), 181–198 (1999)
Google Scholar
Jiang, J., Zhai, C.X.: UIUC in HARD 2004-Passage Retrieval Using HMMs, University of Illinois at Urbana-Champaign. TREC 2004 (2004)
Google Scholar
Jones, K.S.: Automatic summarizing: factors and directions, Computer Laboratory, University of Cambridge (1998)
Google Scholar
Kaszkiel, M., Zobel, J.: Passage retrieval revised. In: Proc. of the 20th Annual International ACM SIGIR Conference on Research and Development in information Retrieval, pp. 178–185 (1997)
Google Scholar
Kawabe, K., Matsumoto, Y.: Acquisition of normal lexical knowledge based on basic level category. Information Processing Society of Japan, SIG note NL125-9, 87–92 (1998) (in Japanese)
Google Scholar
Melucii, M.: Passage Retrieval and a Probabilistic technique. Information Processing and Management 34(1), 43–68 (1998)
Article Google Scholar
Ohkubo, M., Sugizaki, M., Inoue, T., Tanaka, K.: Extracting Information Demand by Analyzing a WWW Search Login. Trans. of Information Processing Society of Japan 39(7), 2250–2258 (1998)
Google Scholar
Salton, G., McGill, M.J.: Introduction of Modern Information Retrieval. McGraw-Hill, New York (1983)
Google Scholar
Salton, G., Allan, J., Singhal, A.K.: Automatic text decomposition and structuring. Information Processing and Management 32(2), 127–138 (1996)
Article Google Scholar
Salton, G.: Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)
Google Scholar
Salton, G., Allan, J., Buckley, C.: Approaches to passage retrieval in full text information systems. In: Proceedings of the 16th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, pp. 49–58 (1993)
Google Scholar
Tsuji, T., Nigazawa, H., Okada, M., Aoe, J.: Early Field Recognition by Using Field Association Words. In: The Proceeding of the 18th International Conference on Computer Processing of Oriental Language, vol. 2, pp. 301–304 (1999)
Google Scholar
Tsuji, T., Fuketa, M., Morita, K., Aoe, J.: An Efficient Method of Determining Field Association Terms of Compound Words. Journal of Natural Language Processing 7(2), 3–26 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima, 770-8506, Japan
Kazuhiro Morita, El-Sayed Atlam, Elmarhomy Ghada, Masao Fuketa & Jun-ichi Aoe

Authors

Kazuhiro Morita
View author publications
You can also search for this author in PubMed Google Scholar
El-Sayed Atlam
View author publications
You can also search for this author in PubMed Google Scholar
Elmarhomy Ghada
View author publications
You can also search for this author in PubMed Google Scholar
Masao Fuketa
View author publications
You can also search for this author in PubMed Google Scholar
Jun-ichi Aoe
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Design, Engineering and Computing, Bournemouth University, UK
Bogdan Gabrys
Centre for SMART Systems, School of Environment and Technology, University of Brighton, BN2 4GJ, Brighton, UK
Robert J. Howlett
School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, SA, 5095, Mawson Lakes, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Morita, K., Atlam, ES., Ghada, E., Fuketa, M., Aoe, Ji. (2006). A New Approach for Improving Field Association Term Dictionary Using Passage Retrieval. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2006. Lecture Notes in Computer Science(), vol 4252. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893004_39

Download citation

DOI: https://doi.org/10.1007/11893004_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46537-9
Online ISBN: 978-3-540-46539-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics