Rough and Fuzzy Sets for Data Mining of a Controlled Vocabulary for Textual Retrieval

  • Padmini Srinivasan
  • Donald Kraft
  • Jianhua Chen
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 50)


We present an approach to text retrieval, incorporating data mining of a controlled i.e., vocabulary mining, in order to improve retrieval Performance. In gener al, formal queries presented to a retrieval System axe not optimized for retrieval efficiency or effectiveness. Vocabulary mining allows us to transform the query via Operations such as generalization or specialization. We offer a new framework for vocabulary mining, combining rough sets and fuzzy sets, allowing us to use rough set approximations when the documents and queries are described us-ing weighted, i.e., fuzzy, representations. We also explore generalized rough sets, variable precision models, and coordinating multiple vocabulary views. Finally, we present a preliminary analysis of the application of our proposed framework to a modern controlled vocabulary, the Unified Medical Language System. The proposed framework supports the systematic study and application of different vocabulary views within the textual Information retrieval model.


Information Retrieval Unify Medical Language System Information Retrieval Model Query Refinement Colfosceril Palmitate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bookstein, A. (1986) Probability and Puzzy-set Applications to Information Retrieval. Annual Review of Information Science and Technolog, 29, 275–279.Google Scholar
  2. 2.
    Cooper, W. S. (1988) Getting beyond Boole. Information Processing and Management, 24, 243–248.CrossRefGoogle Scholar
  3. 3.
    Das-Gupta, P. (1988) Rough Sets and Information Retrieval. In Chiaramella, Y. (Ed.), Proceedings of the llth International Conference of the Association for Computing Machinery Special Interest Group on Information Retrieval (ACM SIGIR), Grenoble, France. 567–582.Google Scholar
  4. 4.
    Dubois, D. and Prade, H. (1990) Rough Puzzy Sets and Puzzy Rough Sets. International Journal of General Systems, 17, 191–209.MATHCrossRefGoogle Scholar
  5. 5.
    Dubois, D. and Prade, H. (1992) Putting rough sets and fuzzy sets together. In Slowinski, R. (Ed.), Intelligent Decision Support: Handbook of Applications and Advances ofthe Rough Sets Theory, Boston, MA: Kluwer Academic Publishers, Boston, 204–232.Google Scholar
  6. 6.
    Harley, R. J., Keen, E. M., Large, J.A., Tedd, L.A. Online Searching: Principles and Practice. London: Bowker Säur.Google Scholar
  7. 7.
    Hu, X., Cercone, N. (1995) Mining knowledge rules from databases: A rough set approach. In Proceedings of the 12ih International Conference on Data Engineering, New Orleans. 96–105.Google Scholar
  8. 8.
    Krusinska, E., Slowinski, R., and Stefanowski. (1992) Discriminant versus rough set approach to vague data analysis. Appl. Stochastic Models and Data Anal, 8, 43–56.MATHCrossRefGoogle Scholar
  9. 9.
    Lin, T.Y. (1989) Neighbourhood Systems and approximation in database and knowledge base Systems. In Proceedings of the Fourth International Symposium on Methodologies of Intelligent Systems. Google Scholar
  10. 10.
    Lin, T.Y. (1992) Topological and Fuzzy Rough Sets. In Slowinski, R. (Ed.), Intelligent Decision Support: handbook of Applications and Advances in Rough Sets Theory. Boston, MA: Kluwer Academic Publishers, Boston, 287–304.CrossRefGoogle Scholar
  11. 11.
    Lin, T.Y. and Liu, Q. (1993) Rough Approximate Operators. In Proceedings of the International Workshop on Rough Sets and Knowledge Discovery, First Edition, 255–257.Google Scholar
  12. 12.
    Lingras, P. J. and Yao, Y.Y. (1998) Data mining using extensions of the rough set model. Journal of the American Society for Information Science, 49(5), 415–422.CrossRefGoogle Scholar
  13. 13.
    Millan, M. and Machuca, F. (1997) Using the rough set theory to exploit the data mining potential in relational databases Systems. In RSSC’97, 344–347.Google Scholar
  14. 14.
    Miyamoto, S., (1990) Fuzzy sets in information retrieval and Cluster analysis. Dordrecht, The Netherlands: Kluwer Press.MATHCrossRefGoogle Scholar
  15. 15.
    Miyamoto, S. (1998) Application of Rough Sets to Information Retrieval. Journal of the American Society for Information Science, 49(3), 195–205.CrossRefGoogle Scholar
  16. 16.
    National Library of Medicine. (1998) Unified Medical Language System (UMLS) Knowledge Sources, 9th edition. MD:NLM.Google Scholar
  17. 17.
    Nguyen, S. Hoa, Skowron, A., Synak, R, O’blewski, J. (1997) Knowledge dis-covery in data bases: Rough set approach. In: Mares, M., Meisar, R., Novak, V., and Ramik, J. (Eds.), Proceedings of ihe Seventh International Fuzzy Systems Association World Congress (IFSA’97), June 25–29, Prague, 2, 204–209.Google Scholar
  18. 18.
    Ohrn, A., Vinterbo, S., Szyma’nski, R, and Komorowski, J. (1997) Modeling cardiac patient set residuals using rough sets. In Proceedings of AMIA Annual Fall Symposium (formerly SCAMC), Nashville, TN, USA, October 25–29, 203–207.Google Scholar
  19. 19.
    Pawlak, Z. (1982) Rough Sets. International Journal of Computer and Information Science. 11, 341–356.MathSciNetMATHCrossRefGoogle Scholar
  20. 20.
    Pawlak, Z. and Skowron, A. (1994) Rough membership functions. In Yager, R.R., Fedrizzi, M., and Kacprzyk, J., (Eds.), Advances in ihe Dempster-Shafer Theory of Evidence. New York, NY: John Wiley & Sons, Inc., 251–271.Google Scholar
  21. 21.
    Robertson, S. E. (1977) The Probability Ranking Principle in IR. Journal of Documentation, 33, 294–304.CrossRefGoogle Scholar
  22. 22.
    Salton G, (Ed.). (1971) The SMART Retrieval System-Experiments in Automatic Document Processing, NJ: Prentice-Hall.Google Scholar
  23. 23.
    Salton, G. (1988) A Simple Blueprint for Automatic Boolean Query Processing. Information Processing and Management, 24, 269–280.CrossRefGoogle Scholar
  24. 24.
    Skowron, A., and Grzymala-Busse, J. W. (1994) Prom rough set theory to evidence theory. In Yaeger, R.R., Fedrizzi, M., and Kacprzyk, J., (Eds.), Advances in the Dempster-Shafer Theory of Evidence. New York, NY: John Wiley & Sons, Inc., 193–236.Google Scholar
  25. 25.
    Srinivasan, P. (1989) Intelligent Information Retrieval using Rough Set Ap-proximations. Information Processing and Management, 25(4), 347–361.Google Scholar
  26. 26.
    Srinivasan, P. (1991) The Importance of Rough Approximations for Information Retrieval. International Journal of Man-Machine Studies, 34, 657–671.CrossRefGoogle Scholar
  27. 27.
    Wong, S.K.M., and Ziarko, W. (1987) Comparison of the probabilistic approx-imate Classification and the fuzzy set model. Fuzzy Sets and Systems, 21, 357–362.MathSciNetMATHCrossRefGoogle Scholar
  28. 28.
    Yao, Y.Y., and Wong, S.K.M. (1992) A decision theoretic framework for approx-imating concepts. International Journal of Man-Machine Studies, 37,793–809.CrossRefGoogle Scholar
  29. 29.
    Yao, Y.Y., Li, X., Lin, T.Y., and Liu, Q. (1994) Representation and Classification of rough set models. In Lin, T.Y. and Wildberger, A.M. (Eds.), Soft Computing: Proceedings of the Third International Workshop on Rough Sets and Soft Computing (RSSC ‘94), San Jose, CA. Nov. 10–12. San Diego, CA: The Society for Computer Simulation, 44–47.Google Scholar
  30. 30.
    Yao, Y.Y. (1997) Combination of Rough and Fuzzy Sets based on alpha-level sets. In Lin, T.Y. and Cerone, N. (Eds.), Rough Sets and Data Mining: Analysis for Imprecise Data, Boston, MA: Kluwer Academic Publishers, 301–321.CrossRefGoogle Scholar
  31. 31.
    Zakowski, W. (1983) Approximations in the Space (U,II). Demonstratio Math-ematica, XVI, 761–769.MathSciNetGoogle Scholar
  32. 32.
    Ziarko, W. (1993) Variable precision rough set model. Journal of Computer and System Sciences, 46, 39–59.MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Padmini Srinivasan
    • 1
  • Donald Kraft
    • 2
  • Jianhua Chen
    • 2
  1. 1.School of Library and Information ScienceThe University of IowaIowa CityUSA
  2. 2.Department of Computer ScienceLouisiana State UniversityBaton RougeUSA

Personalised recommendations