Skip to main content

Development of an audio-visual database system for human identification

  • Systems and Applications
  • Conference paper
  • First Online:
Audio- and Video-based Biometric Person Authentication (AVBPA 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1206))

  • 2413 Accesses

Abstract

Database systems dealing with textual contents have been in use for a long time. A database management system (DBMS) allows convenient and efficient storage and retrieval of a huge amount of data. Traditional databases are designed for handling alphanumeric data efficiently, but fail to manage complex data like audio and/or video. One dimensional audio data and two dimensional image data can be stored in the form of a binary large object (BLOB) with no emphasis on the contents. Textual information can be attached to BLOBs for retrieval, but mere a textual information is insufficient for describing the rich contents of data. So there is a need to extend the capabilities of such information management system to handle both audio and visual data. Contents of such data items can be extracted in the form of features which can be used for distinction amongst the instances of these data types.

This paper describes how the relational data model can be extended to retrieve face images and audio data in the form of utterances of alphabets. Face images are characterized by sizes of different objects, e.g. nose, lips and the inter-object distances. The audio data is characterized by pitch, formants and LPC coefficients. The purpose of the paper is to develop an automated system for human identification based on audio-visual querying. The system allows the query to be partly audio, partly visual and textual.

Financial assistance from Alexander Von Humboldt Foundation is greately acknowledged.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. R. Bach, S. Paul and R. Jain, ”A visual information management system for interactive retrieval of faces”, IEEE Trans, on Knowledge and Data Engg., vol. 5, August 93, pp. 619–628.

    Google Scholar 

  2. G. Chow and X. Li, ”Toward a system for automatic facial feature detection”, Pattern Recognition, vol. 26, No. 12, 1993, pp. 1739–1755.

    Google Scholar 

  3. J. Flanagoan, Speech analysis, Synthesis and perception, II ed. Springer-Verlag pub., 1972.

    Google Scholar 

  4. M. Flinker and H. Sawhaney, ”Query by image and video content: The QBIC system”, IEEE computer, sept. 1995, pp. 23–30.

    Google Scholar 

  5. A. J. Goldstein, L. D. Harmon, A. B. Lesk, ”Identification of human faces”, Proc. of IEEE, vol. 59, No. 5, May 1971, pp. 749–760.

    Google Scholar 

  6. W. I. Grosky, ”Towards a data model for integrated pictorial databases”, Computer Vision, Graphics and Image Processing, 25, 1984, pp. 371–382.

    Google Scholar 

  7. V. Gudivada, V. Raghavan, ”A unified approach to data modeling and retrieval for a class of image database applications”, in Multimedia database system, Springer-Verlag pub., 1996, pp. 37–73.

    Google Scholar 

  8. F. Itakura, ”Minimum prediction residual principle applied to speech recognition”, IEEE ASSP-23, Feb. 1975, p. 67.

    Google Scholar 

  9. R. Jain, S. N. J. Murthy, P. L-J Tran, S. Chatterjee, ”Similarity measures for image databases”, in FUZZ-IEEE'95.

    Google Scholar 

  10. J. Markel, ”Digital inverse filtering a new tool for formant trajectory estimation”, IEEE Trans. AU-20, Jun. 1972, p. 129.

    Google Scholar 

  11. S. McCandless, ”An algorithm for automatic formant extraction using linear prediction spectra”, IEEE ASSP-22, April 1972, p. 135.

    Google Scholar 

  12. N. Miller, ”Pitch detection by data reduction”, IEEE ASSP-23, Feb 1975, p. 72.

    Google Scholar 

  13. V. E. Ogle, ”Chabot: Retrieval from a relational database of images”, IEEE Computer, Sept. 1995, pp. 40–48.

    Google Scholar 

  14. J. K. Ousterhout, Tcl and Tk Toolkit, Addison-Wesley pub., 1994.

    Google Scholar 

  15. N. Roeder, X. Li, ”Accuracy analysis for facial feature detection”, Pattern recognition, Jan. 1996, pp. 143–157.

    Google Scholar 

  16. A. Samal, P. Iyenger, ”Automatic recognition and analysis of human faces and facial expression: A survey”, Pattern Recognition, vol. 25, No. 1, 1992, pp. 65–77.

    Google Scholar 

  17. S. Santini and R. Jain, ”Similarity queries in image database”, to appear in CVPR, June 96.

    Google Scholar 

  18. R. Schafer, L. Rabiner, ”Digital representation of speech signals”, IEEE Proc., vol. 63, April 1975, p. 662.

    Google Scholar 

  19. G. Y. Tang, ”A management system for an integrated database of pictures and alphanumeric data”, Computer Vision, Graphics and Image Processing, 16, 1981, pp. 270–286.

    Google Scholar 

  20. A. Yoshitaka, S. Kishida and M. Hirakawa, ”Knowledge assisted content based retrieval for multimedia databases”, IEEE Multimedia, winter 1994, pp. 12–21.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Josef Bigün Gérard Chollet Gunilla Borgefors

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bargale, C.B., Chaudhuri, S., Bhattacharyya, P. (1997). Development of an audio-visual database system for human identification. In: Bigün, J., Chollet, G., Borgefors, G. (eds) Audio- and Video-based Biometric Person Authentication. AVBPA 1997. Lecture Notes in Computer Science, vol 1206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0016014

Download citation

  • DOI: https://doi.org/10.1007/BFb0016014

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-62660-2

  • Online ISBN: 978-3-540-68425-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics