VPCID—A VoIP Phone Call Identification Database

  • Yuankun Huang
  • Shunquan Tan
  • Bin LiEmail author
  • Jiwu Huang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11378)


Audio forensic plays an important role in the field of information security to address disputes related to the authenticity and originality of audio. However, some audio forensics methods presented in existing references were evaluated under either non-forensic oriented databases or private databases which were not publicly available. It creates difficulty for researchers to make comparison between different methods. In this paper we established VPCID, a VoIP phone call identification database for audio forensic purpose. As there is an increasing trend of phone scams or voice phishing via VoIP, through which the caller’s identity can be hidden or forged easily, it is demanded to address the issues of identifying VoIP phone calls. The VPCID database is comprising of 1152 VoIP call recordings and 1152 mobile phone call recordings, each of which has more than two minutes. Recordings were collected from 48 different speakers using different smart phones and by considering varies recording conditions such as VoIP software, locations etc. We used MFCC (Mel-Frequency Cepstral Coefficients) and ACV (Amplitude Co-occurrence Vector) based features respectively equipped with SVM (Support Vector Machine) classifier to perform classification on the database. We also evaluated our own database on a CNN (convolutional neural network), but the performance is not too much satisfactory. Therefore the VoIP phone call identification problem is challenging and it calls for more effective solutions to address the problem. We hope our proposed database will convey more than this paper and inspire the future studies, which is openly available in below link,, and we welcome the use of this database.


Audio forensics VoIP Call recording Identification 


  1. 1.
  2. 2.
    vd Groenendaal, H.: Why phone fraud starts with a silent call (2014).
  3. 3.
    McGlasson, L.: Vishing scam: four more states struck (2010).
  4. 4.
    Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S.: Darpa timit acoustic-phonetic continous speech corpus CD-ROM. nist speech disc 1–1.1. NASA STI/Recon technical report n 93 (1993)Google Scholar
  5. 5.
    Jenner, F., Kwasinski, A.: Highly accurate non-intrusive speech forensics for codec identifications from observed decoded signals. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1737–1740. IEEE (2012)Google Scholar
  6. 6.
    Luo, D., Yang, R., Li, B., Huang, J.: Detection of double compressed AMR audio using stacked autoencoder. IEEE Trans. Inf. Forensics Secur. 12(2), 432–444 (2017)CrossRefGoogle Scholar
  7. 7.
    Robinson, T., Fransen, J., Pye, D., Foote, J., Renals, S.: WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition. In: 1995 International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1995, vol. 1, pp. 81–84. IEEE (1995)Google Scholar
  8. 8.
    Lin, X., Liu, J., Kang, X.: Audio recapture detection with convolutional neural networks. IEEE Trans. Multimedia 18(8), 1480–1487 (2016)CrossRefGoogle Scholar
  9. 9.
    Hu, Y., Loizou, P.C.: Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun. 49(7–8), 588–601 (2007)CrossRefGoogle Scholar
  10. 10.
    Cao, W., Wang, H., Zhao, H., Qian, Q., Abdullahi, S.M.: Identification of electronic disguised voices in the noisy environment. In: Shi, Y.Q., Kim, H.J., Perez-Gonzalez, F., Liu, F. (eds.) IWDW 2016. LNCS, vol. 10082, pp. 75–87. Springer, Cham (2017). Scholar
  11. 11.
    Hanilci, C., Ertas, F., Ertas, T., Eskidere, Ö.: Recognition of brand and models of cell-phones from recorded speech signals. IEEE Trans. Inf. Forensics Secur. 7(2), 625–634 (2012)CrossRefGoogle Scholar
  12. 12.
    Kotropoulos, C., Samaras, S.: Mobile phone identification using recorded speech signals. In: 2014 19th International Conference on Digital Signal Processing (DSP), pp. 586–591. IEEE (2014)Google Scholar
  13. 13.
    Wu, Z., et al.: SAS: a speaker verification spoofing database containing diverse attacks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4440–4444. IEEE (2015)Google Scholar
  14. 14.
    Kinnunen, T., et al.: The ASVspoof 2017 challenge: assessing the limits of replay spoofing attack detection (2017)Google Scholar
  15. 15.
    Luo, D., Korus, P., Huang, J.: Band energy difference for source attribution in audio forensics. IEEE Trans. Inf. Forensics Secur. 13(9), 2179–2189 (2018)CrossRefGoogle Scholar
  16. 16.
    Hicsonmez, S., Sencar, H.T., Avcibas, I.: Audio codec identification from coded and transcoded audios. Digital Signal Process. 23(5), 1720–1730 (2013)CrossRefGoogle Scholar
  17. 17.
    Scholz, K., Leutelt, L., Heute, U.: Speech-codec detection by spectral harmonic-plus-noise decomposition. In: Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers. vol. 2, pp. 2295–2299. IEEE (2004)Google Scholar
  18. 18.
    Svečko, R., Kotnik, B., Chowdhury, A., Mezgec, Z.: GSM speech coder indirect identification algorithm. Informatica 21(4), 575–596 (2010)Google Scholar
  19. 19.
    Zhou, J.: Automatic speech codec identification with applications to tampering detection of speech recordings. Ph.D. thesis (2011)Google Scholar
  20. 20.
    Sharma, D., Naylor, P.A., Gaubitch, N.D., Brookes, M.: Non intrusive codec identification algorithm. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4477–4480. IEEE (2012)Google Scholar
  21. 21.
    Drăghicescu, D., Pop, G., Burileanu, D., Burileanu, C.: GMM-based audio codec detection with application in forensics. In: 2015 38th International Conference on Telecommunications and Signal Processing (TSP), pp. 1–5. IEEE (2015)Google Scholar
  22. 22.
    Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: Readings in speech recognition, pp. 65–74. Elsevier (1990)Google Scholar
  23. 23.
    Luo, D., Sun, M., Huang, J.: Audio postprocessing detection based on amplitude cooccurrence vector feature. IEEE Signal Process. Lett. 23(5), 688–692 (2016)CrossRefGoogle Scholar
  24. 24.
    Dai, W., Dai, C., Qu, S., Li, J., Das, S.: Very deep convolutional neural networks for raw waveforms. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 421–425. IEEE (2017)Google Scholar
  25. 25.
    Kraetzer, C., Oermann, A., Dittmann, J., Lang, A.: Digital audio forensics: a first practical evaluation on microphone and environment classification. In: Proceedings of the 9th Workshop on Multimedia & Security, pp. 63–74. ACM (2007)Google Scholar
  26. 26.
    Furui, S.: Speaker-independent isolated word recognition based on emphasized spectral dynamics. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. ICASSP 1986, vol. 11, pp. 1991–1994. IEEE (1986)Google Scholar
  27. 27.
    Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Yuankun Huang
    • 1
  • Shunquan Tan
    • 2
  • Bin Li
    • 1
    Email author
  • Jiwu Huang
    • 1
  1. 1.Guangdong Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, College of Information EngineeringShenzhen UniversityShenzhenChina
  2. 2.National Engineering Laboratory for Big Data System Computing Technology, College of Computer Science and Software EngineeringShenzhen UniversityShenzhenChina

Personalised recommendations