Journal of Intelligent Information Systems

, Volume 4, Issue 1, pp 7–25 | Cite as

Automated analysis and exploration of image databases: Results, progress, and challenges

  • Usama M. Fayyad
  • Padhraic Smyth
  • Nicholas Weir
  • S. Djorgovski


In areas as diverse as earth remote sensing, astronomy, and medical imaging, image acquisition technology has undergone tremendous improvements in recent years. The vast amounts of scientific data are potential treasure-troves for scientific investigation and analysis. Unfortunately, advances in our ability to deal with this volume of data in an effective manner have not paralleled the hardware gains. While special-purpose tools for particular applications exist, there is a dearth of useful general-purpose software tools and algorithms which can assist a scientist in exploring large scientific image databases. This paper presents our recent progress in developing interactive semi-automated image database exploration tools based on pattern recognition and machine learning technology. We first present a completed and successful application that illustrates the basic approach: the SKICAT system used for the reduction and analysis of a 3 terabyte astronomical data set. SKICAT integrates techniques from image processing, data classification, and database management. It represents a system in which machine learning played a powerful and enabling role, and solved a difficult, scientifically significant problem. We then proceed to discuss the general problem of automated image database exploration, the particular aspects of image databases which distinguish them from other databases, and how this impacts the application of off-the-shelf learning algorithms to problems of this nature. A second large image database is used to ground this discussion: Magellan's images of the surface of the planet Venus. The paper concludes with a discussion of current and future challenges.


Machine Learning Pattern Recognition Automated Data Analysis Astronomy Sky Surveys Image Processing Large Image Databases 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. General Accounting Office (1992). “Earth Observing System—NASA's EOSDIS Development Approach is Risky”,GAO Report: GAO/IMTEC-92-24, Feb. 1992.Google Scholar
  2. Aubele, J. C. and Slyuta, E. N. (1990). “Small domes on Venus: characteristics and origins,” inEarth, Moon and Planets, 50/51, 493–532.Google Scholar
  3. Amit, Y., Grenander, U., and Piccioni, M. (1991) “Structural image restoration through deformable templates,”J. American Statistical Association, 86(414), pp. 376–387, June 1991.Google Scholar
  4. Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984).Classification and Regression Trees. Monterey, CA: Wadsworth & Brooks.Google Scholar
  5. Burl, M.C., Fayyad, U.M., Perona, P., Smyth, P., and Burl, M.P. (1994). “Automating the hunt for volcanoes on Venus”. To appear inProc. of Computer Vision and Pattern Recognition Conference.Google Scholar
  6. Burrough, P. A. (1986).Principles of Geographic Information Systems for Land Resources Assessment, Oxford: Clarenden.Google Scholar
  7. Cheeseman, P. et al (1988). “Bayesian Classification.”Proc. of the 7th Nat. Conf.on Artificial Intelligence AAAI-88, pp. 607–611, Saint Paul, MN.Google Scholar
  8. Chesters, M. S. (1992). “Human visual perception and ROC methodology in medical imaging,”Phys. Med. Biol., vol. 37, no.7, pp. 1433–1476.Google Scholar
  9. Cooke, R. M. (1991).Experts in Uncertainty: Opinion and Subjective Probability in Science, Oxford University Press, New York.Google Scholar
  10. Cross, A.M. (1987).Int. J. Remote Sensing, 9, no.9, 1519–1528.Google Scholar
  11. Djorgovski, S., Weir, N., and Fayyad, U. (1994). “Processing and Analysis of the Palomar — STScI Digital Sky Survey Using a Novel Software Technology”, in D. Crabtree, J. Barnes, and R. Hanisch (eds.),Astronomical Data Analysis Software and Systems III, A.S.P. Conf. Ser. in press.Google Scholar
  12. Dubois, D., Prade, H., Godo, L., Lopez de Mantaras, R. (1992). “A symbolic approach to reasoning with linguistic qualifiers,” inProceedings of the Eight Conference on Uncertainty in AI, San Mateo, CA: Morgan Kaufmann, pp. 74–82.Google Scholar
  13. Duda, R.O. and Hart, P.E. (1973)Pattern Classification and Scene Analysis. New York: John Wiley and Sons.Google Scholar
  14. Fayyad, U.M. and Irani, K.B. (1990). “What should be minimized in a decision tree?”Proceedings of Eighth National Conference on Artificial Intelligence AAAI-90, Boston, MA.Google Scholar
  15. Fayyad, U.M. (1991).On the Induction of Decision Trees for Multiple Concept Learning. PhD Dissertation, EECS Dept. The University of Michigan.Google Scholar
  16. Fayyad, U.M. and Irani, K.B. (1992). “The attribute selection problem in decision tree generation”Proc. of the Tenth National Conference on Artificial Intelligence AAAI-92 (pp. 104–110). Cambridge, MA: MIT Press.Google Scholar
  17. Fayyad, U. Weir, N., and Djorgovski, S.G. (1993). “SKICAT: a machine learning system for automated cataloging of large scale sky surveys.”Proc. of Tenth Int. Conf. on Machine Learning, Morgan Kaufman.Google Scholar
  18. Fayyad, U.M. and Irani, K.B. (1993). “Multi-interval discretization of continuous-valued attributes for classification learning.”Proc. of the 13th International Joint Conference on Artificial Intelligence IJCAI-93. Chambery, France: Morgan Kauffman.Google Scholar
  19. Finney, D.J., Latscha, R., Bennett, B.M., and Hsu, P. (1963).Tables for Testing Significance in a 2x2 Contingency Table. Cambridge: Cambridge University Press.Google Scholar
  20. Genest, C. and Zidek, J. V. (1986). “Combining probability distributions: a critique and an annotated bibliography,”Statistical Science, vol. 1, no.1, pp. 114–118.Google Scholar
  21. Geman, S. and Geman, D. (1984). “Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images,”IEEE Trans. Patt. Anal Mach. Int., vol. 6, no.6, 721–741.Google Scholar
  22. Guest, J. E. et al. (1992).Journal Geophys. Res., 97, E10, 15949.Google Scholar
  23. Head, J. W., et al. (1991). “Venus volcanic centers and their environmental settings: recent data from Magellan,” EOS 72, p.175, American Geophysical Union Spring meeting abstracts.Google Scholar
  24. Head, J. W. et al. (1992).Journal Geophysical Res., 97, E8, 13,153–13,197.Google Scholar
  25. Jarvis, J. and Tyson, A. (1981).Astr. Journ. 86:41.Google Scholar
  26. Marble, D. F. and Peuquet, D. J. (1983). “Geographical information systems and remote sensing,” inManual of Remote Sensing, 2nd ed., R. E. Colwell (ed.), Falls Church, VA: Amer. Soc. Photogrammetry.Google Scholar
  27. Miller, M. I., Christensen, G. E., and Amit, Y. (1993). “A mathematical textbook of deformable neuroanatomies,” submitted toScience.Google Scholar
  28. Pettengill, G. H. et al. (1991). “Magellan: radar performance and products,”Science, vol. 252, 260–265, 12 April 1991.Google Scholar
  29. Quegan, S., et al, (1988).Trans. R. Soc. London, A 324, 409–421.Google Scholar
  30. Quinlan, J.R. (1986). “The induction of decision trees.”Machine Learning vol. 1, no. 1.Google Scholar
  31. Quinlan, J.R. (1990). “Probabilistic decision trees.”Machine Learning: An Artificial Intelligence Approach vol. III. Y. Kodratoff & R. Michalski (eds.) San Mateo, CA: Morgan Kaufmann.Google Scholar
  32. Ripley, B. D. (1988).Statistical Inference for Spatial Processes, Cambridge University Press, Cambridge.Science, special issue on Magellan data, April 12, 1991.Google Scholar
  33. Smyth, P. and Mellstrom, J. (1992). “Detecting novel classes with applications to fault diagnosis,” inProceedings of the Ninth International Conference on Machine Learning, Morgan Kaufmann Publishers: Los Altos, CA, pp. 416–425.Google Scholar
  34. Smyth, P. (1994). “Learning with probabilistic supervision,” inComputational Learning Theory and Natural Learning Systems 3, T. Petcshe, M. Kearns, S. Hanson, R. Rivest (eds), Cambridge, MA: MIT Press, to appear.Google Scholar
  35. Turk, M. and Pentland, A. (1991). “Eigenfaces for recognition.”J. of Cognitive Neurosci., 3:71–86.Google Scholar
  36. Valdes (1982).Instrumentation in Astronomy IV, SPIE vol. 331, no. 465.Google Scholar
  37. Way, J. and Smith, E. A. (1991). “The evolution of synthetic aperture radar systems and their progression to the EOS SAR,”IEEE Trans, on Geoscience and Remote Sensing, vol. 29, no.6, pp. 962–985.Google Scholar
  38. Weir, N. Djorgovski, S.G., Fayyad, U. et al (1992). “SKICAT: A system for the scientific analysis of the Palomar-STScI Digital Sky Survey.”Proc. of Astronomy from Large databases II, p. 509, Munich, Germany: European Southern Observatory.Google Scholar
  39. Weir, N., Djorgovski, S., Fayyad, U., Smith, J.D., and Roden, J. (1994). “Cataloging the Northern Sky Using a New Generation of Software Technology”, in H. MacGillivray (ed.),Astronomy From Wide-Field Imaging, Proceedings of the IAU Symp. #161, in press. Dordrecht: Kluwer.Google Scholar

Copyright information

© Kluwer Academic Publishers 1995

Authors and Affiliations

  • Usama M. Fayyad
    • 1
  • Padhraic Smyth
    • 1
  • Nicholas Weir
    • 2
  • S. Djorgovski
    • 2
  1. 1.Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadena
  2. 2.Astronomy DepartmentCalifornia Institute of TechnologyPasadena

Personalised recommendations