Indexing and Retrieval of Speech Documents

  • Piyush Kumar P. SinghEmail author
  • K. E. Manjunath
  • R. Ravi Kiran
  • Jainath Yadav
  • K. Sreenivasa Rao
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 27)


In this paper, a speech document indexing system and similarity-based document retrieval method has been proposed. K-d tree is used as the index structure and codebooks derived from speech documents present in the database, are used during retrieval of desired document. Each document is represented as a sequence of codebook indices. The longest common subsequence based approach is proposed for retrieving the documents. Proposed retrieval method is evaluated using a speech database of 3 hours recorded by a male speaker and speech queries from 5 male and 5 female speakers. The accuracy of retrieval is found to be about 88% for the queries given by male speakers.


Indexing and Retrieval codebook MFCC k-d tree retrieval longest common subsequence 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cha, G.-H.: An Effective and Efficient Indexing Scheme for Audio Fingerprinting. In: Proceedings of the 2011 Fifth FTRA International Conference on Multimedia and Ubiquitous Engineering, Washington, DC, USA, pp. 48–52 (2011)Google Scholar
  2. 2.
    Chen, A.L.P., Chang, M., Chen, J., Hsu, J.-L., Hsu, C.-H., Hua, S.Y.S.: Query by music segments: an efficient approach for song retrieval. In: 2000 IEEE International Conference on Multimedia and Expo (2000)Google Scholar
  3. 3.
    Foote, J.T.: Content-Based Retrieval of Music and Audio. In: Proceedings of SPIE, Multimedia Storage and Archiving Systems II, pp. 138–147 (1997)Google Scholar
  4. 4.
    Friedman, J.H., Bentley, J.L., Finkel, R.A.: An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Transactions on Mathematical Software 3(3), 209–226 (1977)CrossRefzbMATHGoogle Scholar
  5. 5.
    Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. ACM Communucations, 341–343 (1975)Google Scholar
  6. 6.
    Deller Jr., J.R., Hansen, J.H.L., Proakis, J.G.: Discrete-Time Processing of Speech Signal. IEEE Press (2000)Google Scholar
  7. 7.
    Kosugi, N., Nishihara, Y., Sakata, T., Yamamuro, M., Kushima, K.: A practical query-by-humming system for a large music database. In: Proceedings of the Eighth ACM International Conference on Multimedia, pp. 333–342 (2000)Google Scholar
  8. 8.
    Lemström, K., Laine, P.: Musical information retrieval using musical parameters. In: Proceedings of the 1998 International Computer Music Conference (1998)Google Scholar
  9. 9.
    Li, G., Khokhar, A.A.: Content-based indexing and retrieval of audio data using wavelets. In: 2000 IEEE International Conference on Multimedia & Expo, pp. 885–888 (2000)Google Scholar
  10. 10.
    Lu, L., You, H., Zhang, H.-J.: A new approach to query by humming in music retrieval. In: ICME 2001, pp. 595–598 (2001)Google Scholar
  11. 11.
    Maier, D.: The Complexity of Some Problems on Subsequences and Supersequences. J. ACM, 322–336 (1978)Google Scholar
  12. 12.
    Rabiner, L., Juang, B.-H.: Fundamentals of speech recognition. Prentice-Hall, Inc. (1993)Google Scholar
  13. 13.
    Rao, K.S., Pachpande, K., Vempada, R.R., Maity, S.: Segmentation of TV broadcast news using speaker specific information. In: NCC 2012, pp. 1–5 (2012)Google Scholar
  14. 14.
    Subramanya, S.R., Youssef, A.: Wavelet-based Indexing of Audio Data in Audio/Multimedia Databases. In: Proceedings of MultiMedia Database Management Systems (1998)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Piyush Kumar P. Singh
    • 1
    Email author
  • K. E. Manjunath
    • 1
  • R. Ravi Kiran
    • 1
  • Jainath Yadav
    • 1
  • K. Sreenivasa Rao
    • 1
  1. 1.Indian Institute of Technology KharagpurKharagpurIndia

Personalised recommendations