Abstract
We present an approach to text retrieval, incorporating data mining of a controlled i.e., vocabulary mining, in order to improve retrieval Performance. In gener al, formal queries presented to a retrieval System axe not optimized for retrieval efficiency or effectiveness. Vocabulary mining allows us to transform the query via Operations such as generalization or specialization. We offer a new framework for vocabulary mining, combining rough sets and fuzzy sets, allowing us to use rough set approximations when the documents and queries are described us-ing weighted, i.e., fuzzy, representations. We also explore generalized rough sets, variable precision models, and coordinating multiple vocabulary views. Finally, we present a preliminary analysis of the application of our proposed framework to a modern controlled vocabulary, the Unified Medical Language System. The proposed framework supports the systematic study and application of different vocabulary views within the textual Information retrieval model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bookstein, A. (1986) Probability and Puzzy-set Applications to Information Retrieval. Annual Review of Information Science and Technolog, 29, 275–279.
Cooper, W. S. (1988) Getting beyond Boole. Information Processing and Management, 24, 243–248.
Das-Gupta, P. (1988) Rough Sets and Information Retrieval. In Chiaramella, Y. (Ed.), Proceedings of the llth International Conference of the Association for Computing Machinery Special Interest Group on Information Retrieval (ACM SIGIR), Grenoble, France. 567–582.
Dubois, D. and Prade, H. (1990) Rough Puzzy Sets and Puzzy Rough Sets. International Journal of General Systems, 17, 191–209.
Dubois, D. and Prade, H. (1992) Putting rough sets and fuzzy sets together. In Slowinski, R. (Ed.), Intelligent Decision Support: Handbook of Applications and Advances ofthe Rough Sets Theory, Boston, MA: Kluwer Academic Publishers, Boston, 204–232.
Harley, R. J., Keen, E. M., Large, J.A., Tedd, L.A. Online Searching: Principles and Practice. London: Bowker Säur.
Hu, X., Cercone, N. (1995) Mining knowledge rules from databases: A rough set approach. In Proceedings of the 12ih International Conference on Data Engineering, New Orleans. 96–105.
Krusinska, E., Slowinski, R., and Stefanowski. (1992) Discriminant versus rough set approach to vague data analysis. Appl. Stochastic Models and Data Anal, 8, 43–56.
Lin, T.Y. (1989) Neighbourhood Systems and approximation in database and knowledge base Systems. In Proceedings of the Fourth International Symposium on Methodologies of Intelligent Systems.
Lin, T.Y. (1992) Topological and Fuzzy Rough Sets. In Slowinski, R. (Ed.), Intelligent Decision Support: handbook of Applications and Advances in Rough Sets Theory. Boston, MA: Kluwer Academic Publishers, Boston, 287–304.
Lin, T.Y. and Liu, Q. (1993) Rough Approximate Operators. In Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, First Edition, 255–257.
Lingras, P. J. and Yao, Y.Y. (1998) Data mining using extensions of the rough set model. Journal of the American Society for Information Science, 49(5), 415–422.
Millan, M. and Machuca, F. (1997) Using the rough set theory to exploit the data mining potential in relational databases Systems. In RSSC’97, 344–347.
Miyamoto, S., (1990) Fuzzy sets in information retrieval and Cluster analysis. Dordrecht, The Netherlands: Kluwer Press.
Miyamoto, S. (1998) Application of Rough Sets to Information Retrieval. Journal of the American Society for Information Science, 49(3), 195–205.
National Library of Medicine. (1998) Unified Medical Language System (UMLS) Knowledge Sources, 9th edition. MD:NLM.
Nguyen, S. Hoa, Skowron, A., Synak, R, O’blewski, J. (1997) Knowledge dis-covery in data bases: Rough set approach. In: Mares, M., Meisar, R., Novak, V., and Ramik, J. (Eds.), Proceedings of ihe Seventh International Fuzzy Systems Association World Congress (IFSA’97), June 25–29, Prague, 2, 204–209.
Ohrn, A., Vinterbo, S., Szyma’nski, R, and Komorowski, J. (1997) Modeling cardiac patient set residuals using rough sets. In Proceedings of AMIA Annual Fall Symposium (formerly SCAMC), Nashville, TN, USA, October 25–29, 203–207.
Pawlak, Z. (1982) Rough Sets. International Journal of Computer and Information Science. 11, 341–356.
Pawlak, Z. and Skowron, A. (1994) Rough membership functions. In Yager, R.R., Fedrizzi, M., and Kacprzyk, J., (Eds.), Advances in ihe Dempster-Shafer Theory of Evidence. New York, NY: John Wiley & Sons, Inc., 251–271.
Robertson, S. E. (1977) The Probability Ranking Principle in IR. Journal of Documentation, 33, 294–304.
Salton G, (Ed.). (1971) The SMART Retrieval System-Experiments in Automatic Document Processing, NJ: Prentice-Hall.
Salton, G. (1988) A Simple Blueprint for Automatic Boolean Query Processing. Information Processing and Management, 24, 269–280.
Skowron, A., and Grzymala-Busse, J. W. (1994) Prom rough set theory to evidence theory. In Yaeger, R.R., Fedrizzi, M., and Kacprzyk, J., (Eds.), Advances in the Dempster-Shafer Theory of Evidence. New York, NY: John Wiley & Sons, Inc., 193–236.
Srinivasan, P. (1989) Intelligent Information Retrieval using Rough Set Ap-proximations. Information Processing and Management, 25(4), 347–361.
Srinivasan, P. (1991) The Importance of Rough Approximations for Information Retrieval. International Journal of Man-Machine Studies, 34, 657–671.
Wong, S.K.M., and Ziarko, W. (1987) Comparison of the probabilistic approx-imate Classification and the fuzzy set model. Fuzzy Sets and Systems, 21, 357–362.
Yao, Y.Y., and Wong, S.K.M. (1992) A decision theoretic framework for approx-imating concepts. International Journal of Man-Machine Studies, 37,793–809.
Yao, Y.Y., Li, X., Lin, T.Y., and Liu, Q. (1994) Representation and Classification of rough set models. In Lin, T.Y. and Wildberger, A.M. (Eds.), Soft Computing: Proceedings of the Third International Workshop on Rough Sets and Soft Computing (RSSC ‘94), San Jose, CA. Nov. 10–12. San Diego, CA: The Society for Computer Simulation, 44–47.
Yao, Y.Y. (1997) Combination of Rough and Fuzzy Sets based on alpha-level sets. In Lin, T.Y. and Cerone, N. (Eds.), Rough Sets and Data Mining: Analysis for Imprecise Data, Boston, MA: Kluwer Academic Publishers, 301–321.
Zakowski, W. (1983) Approximations in the Space (U,II). Demonstratio Math-ematica, XVI, 761–769.
Ziarko, W. (1993) Variable precision rough set model. Journal of Computer and System Sciences, 46, 39–59.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Srinivasan, P., Kraft, D., Chen, J. (2000). Rough and Fuzzy Sets for Data Mining of a Controlled Vocabulary for Textual Retrieval. In: Crestani, F., Pasi, G. (eds) Soft Computing in Information Retrieval. Studies in Fuzziness and Soft Computing, vol 50. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1849-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-7908-1849-9_15
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-7908-2473-5
Online ISBN: 978-3-7908-1849-9
eBook Packages: Springer Book Archive