Skip to main content
Log in

Effective pattern recognition and find-density-peaks clustering based blind identification for underdetermined speech mixing systems

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In order to achieve high-efficiency blind identification (BI) for underdetermined speech mixing systems without recovery degradation, this paper proposes a novel BI scheme based on effective pattern recognition and the find-density-peaks (FDP) clustering algorithm. To lower BI’s computational complexity, a 3-step effective pattern recognition procedure is proposed, which consists of voiced-sound pattern sifting, spectrum correction based harmonic representation and phase uniformity based single-active-source (SAS) pattern recognition. Furthermore, a 5-step FDP clustering procedure is summarized and utilized to determine the souce number and estimate all the columns of the mixing matrix. Our experimental results showed that, the proposed 3-step effective pattern recognition procedure can condense the original 56383 TF patterns into only 194 effective SAS patterns, which considerably alleviates the computational burden of BI. Moreover, by means of FDP clustering, not only the source number can be intuitively and readily determined, but also the mixing matrix can be estimated with a higher recovery SNR than the existing BI schemes. Due to harmonic-like components are of wide applications, our proposed BI scheme possesses a vast potential in other harmonics-related blind-signal-separation (BSS) fields such as mechanical vibration analysis, channel estimation in communication.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Abrard F, Deville Y (2005) A time-frequency blind signal separation method applicable to underdetermined mixtures of dependent sources. Signal Process 85 (7):1389–1403

    Article  MATH  Google Scholar 

  2. Aïssa-El-Bey A, Linh-Trung N, Abed-Meraim K, Belouchrani A, Grenier Y (2007) Underdetermined blind separation of nondisjoint sources in the time-frequency domain. IEEE Trans Signal Process 55(3):897–907

    Article  MathSciNet  MATH  Google Scholar 

  3. Bofill P, Zibulevsky M (2001) Underdetermined blind source separation using sparse representations. Signal Process 81(11):2353–2362

    Article  MATH  Google Scholar 

  4. Florea C, Gordan M, Vlaicu A, Orghidan R (2014) Computationally efficient formulation of sparse color image recovery in the JPEG compressed domain. J Math Imaging Vision 49(1):173–190

    Article  MathSciNet  MATH  Google Scholar 

  5. Gao Z, Zhang H, Xu G, Xue Y, Hauptmannc AG (2015) Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition. Signal Process 112:83–97

    Article  Google Scholar 

  6. Gao Z, Zhang L, Chen M, Hauptmann A, Zhang H, Cai A (2014) Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset. Multimedia Tools and Applications 68(3):641–657

    Article  Google Scholar 

  7. Ge S, Han J, Han M (2015) Nonnegative mixture for underdetermined blind source separation based on a tensor algorithm. Circuits Systems & Signal Processing 34 (9):2935–2950

    Article  MathSciNet  MATH  Google Scholar 

  8. Hayes M, Lim Jae, Oppenheim A (1980) Signal reconstruction from phase or magnitude. IEEE Trans Acoust Speech Signal Process 28(6):672–680

    Article  MathSciNet  MATH  Google Scholar 

  9. He Z, Cichocki A, Zdunek R, Xie S (2009) Improved FOCUSS method with conjugate gradient iterations. IEEE Trans Signal Process 57(1):399–404

    Article  MathSciNet  MATH  Google Scholar 

  10. Jourjine A, Rickard S, Yılmaz Ö (2000) Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures. In: ICASSP, pp 2985–2988

  11. Koeipensri T, Boonchoo P, Sueaseenak D (2016) The development of biosignal processing system (BPS-SWU v1.0) for learning and research in biomedical engineering. In: 2016 9th biomedical engineering international conference, pp 1–4

  12. Liu AA, Nie WZ, Gao Y, Su YT (2016) Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Trans Image Process 25(5):2103–2116

    Article  MathSciNet  Google Scholar 

  13. Liu AA, Su YT, Jia PP, Gao Z, Hao T, Yang ZX (2015) Multipe/single-view human action recognition via part-induced Multitask structural learning. IEEE Transactions on Cybernetics 45(6):1194–1208

    Article  Google Scholar 

  14. Liu AA, Su YT, Nie WZ, Kankanhalli M (2017) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114

    Article  Google Scholar 

  15. Liu B, Reju VG, Khong AWH (2014) A linear source recovery method for underdetermined mixtures of uncorrelated AR-model signals without sparseness. IEEE Trans Signal Process 62(19):4947–4958

    Article  MathSciNet  Google Scholar 

  16. Mohimani H, Babaie-Zadeh M, Jutten C (2009) A fast approach for overcomplete sparse decomposition based on smoothed 0 norm. IEEE Trans Signal Process 57(1):289–301

    Article  MathSciNet  MATH  Google Scholar 

  17. O’Grady PD, Pearlmutter BA (2008) The LOST algorithm: finding lines and separating speech mixtures. EURASIP Journal on Advances in Signal Processing 2008 (1):1–17

    MATH  Google Scholar 

  18. Qiao ZJ, Lei YG, Lin J, Jia F (2016) An adaptive unsaturated bistable stochastic resonance method and its application in mechanical fault diagnosis. Mech Syst Signal Process 84(Part A):731–746

    Google Scholar 

  19. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496

    Article  Google Scholar 

  20. Saab R, Yılmaz Ö, McKeown MJ, Abugharbieh R (2007) Underdetermined anechoic blind source separation via q-basis-pursuit with q < 1. IEEE Trans Signal Process 55(8):4004–4017

    Article  MathSciNet  MATH  Google Scholar 

  21. Sha Z, Huang Z, Zhou Y, Wang F (2013) Frequency-hopping signals sorting based on underdetermined blind source separation. IET Commun 7(14):1456–1464

    Article  Google Scholar 

  22. Siegel LJ, Bessey A (1982) Voiced/unvoiced/mixed excitation classification of speech. IEEE Trans Acoust Speech Signal Process 30(3):451–460

    Article  Google Scholar 

  23. Vaseghi SV (2008) Advanced digital signal processing and noise reduction. Wiley, New York

    Book  Google Scholar 

  24. Xie S, Yang L, Yang J, Zhou G, Xiang Y (2012) Time-frequency approach to underdetermined blind source separation. IEEE Transactions on Neural Networks & Learning Systems 23(2):306–316

    Article  Google Scholar 

  25. Xu ZJ, Gong Y, Wang K, Lu WD, Hua JY (2017) Covert digital communication systems based on joint normal distribution. IET Commun 11 (8):1282–1290

    Article  Google Scholar 

  26. Yang Y, Song J, Huang Z, Ma Z, Sebe N, Hauptmann AG (2013) Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans Multimedia 15(3):572–581

    Article  Google Scholar 

  27. Yılmaz Ö, Rickard S (2004) Blind sepraration of speech mixtures via time-frequency masking. IEEE Trans Signal Process 52(7):1830–1847

    Article  MathSciNet  MATH  Google Scholar 

  28. Zhang F, Geng Z, Yuan W (2001) The algorithm of interpolating windowed FFT for harmonic analysis of electric power system. IEEE Trans Power Delivery 16 (2):160–164

    Article  Google Scholar 

  29. Zhou G, Yang Z, Xie S, Yang J (2011) Mixing matrix estimation from sparse mixtures with unknown number of sources. IEEE Trans Neural Netw 22(2):211–221

    Article  Google Scholar 

Download references

Acknowledgements

This work was financially supported by Qingdao National Laboratory for Marine Science and Technology under Grant No. QNLM2016OPR0411.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangdong Huang.

Appendix A: Dataflow of ratio-based spectrum correction

Appendix A: Dataflow of ratio-based spectrum correction

Given the hanning-windowed STFT spectrograms of M mixtures Xm(t0,kΔf), m = 1,...,Mf = fs/L (denoting Xm(t0,k) for simplicity) at some moment t = t0, the results of spectrum correction (i.e., the outputs \(\hat {f_{m}}, \hat {d_{m}}, \hat {\phi _{m}}\)) are acquired by the following steps:

  • Step 1 Collect all the large-amplitude peak indices k of Xm(t0,k). For each index k, calculate the amplitude ratio vp between Xm(t0,k) and its sub-peak neighbor, i.e.,

    $$\begin{array}{@{}rcl@{}} v =\frac{ |X_{m}(t_{0}, k^{*}) |}{\max \{ | X_{m}(t_{0}, k^{*}-1)|, | X_{m}(t_{0}, k^{*}+ 1) |\} }. \end{array} $$
    (16)

    Further, a variable u can be calculated as

    $$\begin{array}{@{}rcl@{}} u=(2-v )/(1+v ). \end{array} $$
    (17)
  • Step 2 Estimate the aforementioned frequency offset \(\hat {\delta }\) as

    $$ \hat{\delta} = \left\{ \begin{array}{rl} u, ~&\text{if} \ |X_{m}(t_{0}, k^{*}+ 1)|> |X_{m}(t_{0}, k^{*}-1)|\\ -u,~&\text{else} \end{array}\right., $$
    (18)

    then, the frequency estimate is \(\hat f_{m}=(k^{*}+\hat {\delta }) f_{s}/L\).

  • Step 3 Acquire the amplitude estimate \(\hat {d}_{m} \) and phase estimate \(\hat {\phi }_{m}\) as

    $$\begin{array}{@{}rcl@{}} \hat{d}_{m}= 2 \pi \hat{\delta}(1-\hat{\delta}^{2}) |X_{m}(t_{0}, k_{p})| / \sin (\pi\hat{\delta}). \end{array} $$
    (19)
    $$\begin{array}{@{}rcl@{}} \hat\phi_{m}=\text{ang}[X_{m}(t_{0}, k^{*})]- \pi\hat{\delta}(L-1)/L, \end{array} $$
    (20)

    where ang(⋅) refers to the angle operation.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, X., Yang, L., Song, R. et al. Effective pattern recognition and find-density-peaks clustering based blind identification for underdetermined speech mixing systems. Multimed Tools Appl 77, 22115–22129 (2018). https://doi.org/10.1007/s11042-018-5619-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5619-z

Keywords

Navigation