Abstract
Database systems dealing with textual contents have been in use for a long time. A database management system (DBMS) allows convenient and efficient storage and retrieval of a huge amount of data. Traditional databases are designed for handling alphanumeric data efficiently, but fail to manage complex data like audio and/or video. One dimensional audio data and two dimensional image data can be stored in the form of a binary large object (BLOB) with no emphasis on the contents. Textual information can be attached to BLOBs for retrieval, but mere a textual information is insufficient for describing the rich contents of data. So there is a need to extend the capabilities of such information management system to handle both audio and visual data. Contents of such data items can be extracted in the form of features which can be used for distinction amongst the instances of these data types.
This paper describes how the relational data model can be extended to retrieve face images and audio data in the form of utterances of alphabets. Face images are characterized by sizes of different objects, e.g. nose, lips and the inter-object distances. The audio data is characterized by pitch, formants and LPC coefficients. The purpose of the paper is to develop an automated system for human identification based on audio-visual querying. The system allows the query to be partly audio, partly visual and textual.
Financial assistance from Alexander Von Humboldt Foundation is greately acknowledged.
Preview
Unable to display preview. Download preview PDF.
References
J. R. Bach, S. Paul and R. Jain, ”A visual information management system for interactive retrieval of faces”, IEEE Trans, on Knowledge and Data Engg., vol. 5, August 93, pp. 619–628.
G. Chow and X. Li, ”Toward a system for automatic facial feature detection”, Pattern Recognition, vol. 26, No. 12, 1993, pp. 1739–1755.
J. Flanagoan, Speech analysis, Synthesis and perception, II ed. Springer-Verlag pub., 1972.
M. Flinker and H. Sawhaney, ”Query by image and video content: The QBIC system”, IEEE computer, sept. 1995, pp. 23–30.
A. J. Goldstein, L. D. Harmon, A. B. Lesk, ”Identification of human faces”, Proc. of IEEE, vol. 59, No. 5, May 1971, pp. 749–760.
W. I. Grosky, ”Towards a data model for integrated pictorial databases”, Computer Vision, Graphics and Image Processing, 25, 1984, pp. 371–382.
V. Gudivada, V. Raghavan, ”A unified approach to data modeling and retrieval for a class of image database applications”, in Multimedia database system, Springer-Verlag pub., 1996, pp. 37–73.
F. Itakura, ”Minimum prediction residual principle applied to speech recognition”, IEEE ASSP-23, Feb. 1975, p. 67.
R. Jain, S. N. J. Murthy, P. L-J Tran, S. Chatterjee, ”Similarity measures for image databases”, in FUZZ-IEEE'95.
J. Markel, ”Digital inverse filtering a new tool for formant trajectory estimation”, IEEE Trans. AU-20, Jun. 1972, p. 129.
S. McCandless, ”An algorithm for automatic formant extraction using linear prediction spectra”, IEEE ASSP-22, April 1972, p. 135.
N. Miller, ”Pitch detection by data reduction”, IEEE ASSP-23, Feb 1975, p. 72.
V. E. Ogle, ”Chabot: Retrieval from a relational database of images”, IEEE Computer, Sept. 1995, pp. 40–48.
J. K. Ousterhout, Tcl and Tk Toolkit, Addison-Wesley pub., 1994.
N. Roeder, X. Li, ”Accuracy analysis for facial feature detection”, Pattern recognition, Jan. 1996, pp. 143–157.
A. Samal, P. Iyenger, ”Automatic recognition and analysis of human faces and facial expression: A survey”, Pattern Recognition, vol. 25, No. 1, 1992, pp. 65–77.
S. Santini and R. Jain, ”Similarity queries in image database”, to appear in CVPR, June 96.
R. Schafer, L. Rabiner, ”Digital representation of speech signals”, IEEE Proc., vol. 63, April 1975, p. 662.
G. Y. Tang, ”A management system for an integrated database of pictures and alphanumeric data”, Computer Vision, Graphics and Image Processing, 16, 1981, pp. 270–286.
A. Yoshitaka, S. Kishida and M. Hirakawa, ”Knowledge assisted content based retrieval for multimedia databases”, IEEE Multimedia, winter 1994, pp. 12–21.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bargale, C.B., Chaudhuri, S., Bhattacharyya, P. (1997). Development of an audio-visual database system for human identification. In: Bigün, J., Chollet, G., Borgefors, G. (eds) Audio- and Video-based Biometric Person Authentication. AVBPA 1997. Lecture Notes in Computer Science, vol 1206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0016014
Download citation
DOI: https://doi.org/10.1007/BFb0016014
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62660-2
Online ISBN: 978-3-540-68425-1
eBook Packages: Springer Book Archive