Abstract
Query expansion in knowledge based on information retrieval system requires knowledge base being considered semantic relations between words. Since Apriori algorithm extracts association word without taking user preference into account, recall is improved but accuracy is reduced. This paper shows how to establish optimized association word knowledge base with improved accuracy only including association word that users prefer among association words being considered semantic relations between words. Toward this end, web documents related to computer are classified into eight classes, and nouns are extracted from web document of each class. Association word is extracted from nouns through Apriori algorithm, and association word that users do not favor is excluded from knowledge base through genetic algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Reference
R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994.
R. Agrawal and T. Imielinski and A. Swami, "Mining association rules between sets of items in large databases," Proceedings of the 1993 ACM SIGMOD Conference, Washington DC, USA, May 1993.
P. Brown and P. Della and R. Mercer, "Class-based n-gram models of natural language," Computational Linguistics, 18(4), pp. 467–479, 1992.
C. Clifton and R. Steinheiser, "Data Mining on Text," Proceedings of the Twenty-Second Annual International Computer Software & Applications Conference, 1998.
M. Gondon, "Probabilistic and genetic algorithms for document retrieval," Communication of the ACM,31, pp. 1208–1218, 1988.
V. Hatzivassiloglou and K. McKeown, "Towards the automatic identification of adjectival scales: Clustering adjectives according to meaning," Proceedings of the 3 1 st Annual Meeting of the ACL, pp. 172–182, 1993.
K. Hyun-Jin and P. Jay-Duke and J. Myung-Gil and P. Dong-In. "Clustering Korean Nouns Based On Syntactic Relations and Corpus Data," Proceedings of the LASTED International Conference Artificial Intelligence and Soft Computing, 1998.
T. Joachims, "A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization," Proceedings of 14th International Conference on Machine Learning, 1997.
S. J. KO and J. H. Lee, "Feature Selection using Association Word Ming for Classification," Proceedings of the DEXA, LNCS2113, 2001.
H. IU and R. Setiono and H. Liu, "Effective Data Mining Using Neural Networks," Proceeding of the IEEE Trans. Knowledge and data engineering, V.8 N.6, pp. 962–969, 1996.
G. Miller, "Wordnet:An on-line lexical database," International Journal of Lexicography. 3(4), pp. 235–244, 1990.
K. Miyashita and K. Sycara, "Improving System Performance in Case Based Iterative Optimization through Knowledge Filtering," Proceedings of the International Joint Conference on Artificial Intelligence, 1995.
T. Mitchell, Maching Learning, McGraw-Hill, pp. 249–273, 1997.
D. W. Oard and G. Marchionini, "A Conceptual Framework for Text Filtering," Technical Report CAR-TR-830, Human Computer Interaction Laboratory, University of Maryland at College Park, 1996.
C. Plaunt and B. A. Norgard, "An association based method for automatic indexing with a controlled vocabulary," Journal of the American Society for Information Science, 49, 888–902. 1998.
P. C. Wong and P. C. Whitney and J. Thomas, "Visualizing Association Rules for Text Mining," Proceedings of the 1999 IEEE Symposium on Information Visualization, pp. 120–123, 1999.
J. Xu and W. Bruce, "Query Expansion Local and Global Document Analysis," Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 4–11, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ko, SJ., Lee, JH. (2002). Optimization of Association Word Knowledge Base through Genetic Algorithm. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2002. Lecture Notes in Computer Science, vol 2454. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46145-0_21
Download citation
DOI: https://doi.org/10.1007/3-540-46145-0_21
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44123-6
Online ISBN: 978-3-540-46145-6
eBook Packages: Springer Book Archive