Towards Real-Time Cue Integration by Using Partial Results

  • Doug DeCarlo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2353)


Typical cue integration techniques work by combining estimates produced by computations associated with each visual cue. Most of these computations are iterative, leading to partial results that are available upon each iteration, culminating in complete results when the algorithm finally terminates. Combining partial results upon each iteration would be the preferred strategy for cue integration, as early cue integration strategies are inherently more stable and more efficient. Surprisingly, existing cue integration techniques cannot correctly use partial results, but must wait for all of the cue computations to finish. This is because the intrinsic error in partial results, which arises entirely from the fact that the algorithm has not yet terminated, is not represented. While cue integration methods do exist which attempt to use partial results (such as one based on an iterated extended Kalman Filter), they make critical errors.

I address this limitation with the development of a probabilistic model of errors in estimates from partial results, which represents the error that remains in iterative algorithms prior to their completion. This enables existing cue integration frameworks to draw upon partial results correctly. Results are presented on using such a model for tracking faces using feature alignment, contours, and optical flow. They indicate that this framework improves accuracy, efficiency, and robustness over one that uses complete results.

The eventual goal of this line of research is the creation of a decision-theoretic meta-reasoning framework for cue integration—a vital mechanism for any system with real-time deadlines and variable computational demands. This framework will provide a means to decide how to best spend computational resources on each cue, based on how much it reduces the uncertainty of the combined result.


Cue integration real-time vision meta-reasoning 


  1. 1.
    Dosher, B., Sperling, G., Wurst, S.: Tradeoffs between stereopsis and proximity luminance covariance as determinants of perceived 3D structure. Vision Research 26 (1986) 973–990CrossRefGoogle Scholar
  2. 2.
    Jacobs, R., Fine, I.: Experience-dependent integration of texture and motion cues to depth. Vision Research 39 (1999) 4062–4075CrossRefGoogle Scholar
  3. 3.
    Johnston, E.B., Cumming, B.G., Landy, M.: Integration of motion and steropsis cues. Vision Research 34 (1994) 2259–2275CrossRefGoogle Scholar
  4. 4.
    Aloimonos, Y., Shulman, D.: Integration of Visual Modules: An Extension of the Marr Paradigm. Academic Press (1989)Google Scholar
  5. 5.
    Ayache, N., Faugeras, O.: Building, registering, and fusing noisy visual maps. International Journal of Robotics Research 7 (1988) 45–65CrossRefGoogle Scholar
  6. 6.
    Kriegman, D., Triendl, E., Binford, T.: Stereo vision and navigation in buildings for mobile robots. IEEE Transactions on Robotics and Automation 5 (1989) 792–803CrossRefGoogle Scholar
  7. 7.
    McKendall, R., Mintz, M.: Data fusion techniques using robust statistics. In Abidi, M., Gonzalez, R., eds.: Data Fusion in Robotics and Machine Intelligence. (1992) 211–244Google Scholar
  8. 8.
    Brautigam, C., Eklundh, J., Christensen, H.: A model-free voting approach for integrating multiple cues. In: ECCV’ 98. (1998) 734–750Google Scholar
  9. 9.
    Dean, T., Kaelbling, L., Kirman, J., Nicholson, A.: Planning under time constraints in stochastic domains. AI 76 (1995) 35–74Google Scholar
  10. 10.
    Horvitz, E., Lengyel, J.: Perception, attention, and resources: A decision-theoretic approach to graphics rendering. In: UAI’ 97. (1997) 238–249Google Scholar
  11. 11.
    Dean, T., Boddy, M.: An analysis of time-dependent planning. In: AAAI’ 88. (1988) 49–54Google Scholar
  12. 12.
    Zilberstein, S., Russell, S.: Optimal composition of real-time systems. AI 82 (1996) 181–213MathSciNetGoogle Scholar
  13. 13.
    Horvitz, E.: Reasoning under varying and uncertain resource constraints. In: AAAI’ 88. (1988) 111–116Google Scholar
  14. 14.
    Bulthoff, H., Yuille, A.: A bayesian framework for the integration of visual modules. In Inui, T., McClelland, J., eds.: Attention and Performance XVI: Information Integration in Perception and Communication. (1996) 49–70Google Scholar
  15. 15.
    Terzopoulos, D.: Physically-based fusion of visual data over space, time and scale. In Aggarwal, J., ed.: Multisensor Fusion for Computer Vision. Springer-Verlag (1993) 63–69Google Scholar
  16. 16.
    Zhang, G., Wallace, A.: Physical modeling and combination of range and intensity edge data. CVGIP 58 (1993) 191–220CrossRefGoogle Scholar
  17. 17.
    Das, S., Ahuja, N.: Performance analysis of stereo, vergence, and focus as depth cues for active vision. PAMI 17 (1995) 1213–1219CrossRefGoogle Scholar
  18. 18.
    Pankanti, S., Jain, A.: Integrating vision modules: Stereo, shading, grouping, and line labeling. PAMI 17 (1995) 831–842CrossRefGoogle Scholar
  19. 19.
    Cho, K., Meer, P.: Image segmentation from consensus information. CVIU 68 (1997) 72–89Google Scholar
  20. 20.
    Azoz, Y., Devi, L., Sharma, R.: Reliable tracking of human arm dynamics by multiple cue integration and constraint fusion. In: CVPR’ 98. (1998) 905–910Google Scholar
  21. 21.
    Darrell, T., Gordon, G., Harville, M., Woodfill, J.: Integrated person tracking using stereo, color, and pattern detection. IJCV 37 (2000) 175–185zbMATHCrossRefGoogle Scholar
  22. 22.
    Graf, H., Cosatto, E., Gibbon, D., Kocheisen, M., Petajan, E.: Multi-modal system for locating heads and faces. In: AFGR’ 96. (1996) 88–93Google Scholar
  23. 23.
    Rasmussen, C., Hager, G.: Joint probabilistic techniques for tracking multi-part objects. In: CVPR’ 98. (1998) 16–21Google Scholar
  24. 24.
    Toyama, K., Horvitz, E.: Bayesian modality fusion: Probabilistic integration of multiple vision algorithms for head tracking. In: Fourth Asian Conference on Computer Vision. (2000)Google Scholar
  25. 25.
    Hennecke, M., Stork, D., Prasad, K.: Visionary speech: Looking ahead to practical speechreading system. In Stork, D., Hennecke, M., eds.: Speechreading by Humans and Machines: Models, Systems, and Applications (NATO ASI Series F: Computer and Systems Sciences, volume 150). (1996) 331–350Google Scholar
  26. 26.
    Szeliski, R.: Bayesian modeling of uncertainty in low-level vision. IJCV 5 (1990) 271–302CrossRefGoogle Scholar
  27. 27.
    Meer, P., Mintz, D., Kim, D., Rosenfeld, A.: Robust regression methods for computer vision: A review. IJCV 6 (1991) 59–70CrossRefGoogle Scholar
  28. 28.
    Maybeck, P.: Stochastic Models, Estimation and Control, Volume 1. Academic Press (1979)Google Scholar
  29. 29.
    Reynard, D., Wildenberg, A., Blake, A., Marchant, J.: Learning dynamics of complex motions from image sequences. In: ECCV’ 96. (1996) I:357–368CrossRefGoogle Scholar
  30. 30.
    DeCarlo, D., Metaxas, D.: Optical flow constraints on deformable models with applications to face tracking. IJCV 32 (2000) 99–127CrossRefGoogle Scholar
  31. 31.
    Bergen, J., Anandan, P., Hanna, K., Hingorani, R.: Hierarchical model-based motion estimation. In: ECCV’ 92. (1992) 237–252Google Scholar
  32. 32.
    Lowe, D.: Fitting parameterized three-dimensional models to images. PAMI 13 (1991) 441–450CrossRefGoogle Scholar
  33. 33.
    Terzopoulos, D., Witkin, A., Kass, M.: Constraints on deformable models: Recovering 3D shape and nonrigid motion. AI 36 (1988) 91–123zbMATHGoogle Scholar
  34. 34.
    Yuille, A., Cohen, D., Halliman, P.: Feature extraction from faces using deformable templates. IJCV 8 (1992) 104–109CrossRefGoogle Scholar
  35. 35.
    Bishop, C.: Neural Networks for Pattern Recognition. Oxford University Press (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Doug DeCarlo
    • 1
  1. 1.Department of Computer Science and Center for Cognitive ScienceRutgers UniversityPiscatawayUSA

Personalised recommendations