Skip to main content

Instrumental Dimension-Based Speech Quality Modeling

  • Chapter
  • First Online:
Dimension-based Quality Modeling of Transmitted Speech

Part of the book series: T-Labs Series in Telecommunication Services ((TLABS))

  • 475 Accesses

Abstract

In this chapter, the experimental data obtained from the application of the new scaling method is employed to develop a parametric model for diagnostic speech quality prediction: Dimension estimators are developed that estimate dimension impairment factors, whereas the total impairment is estimated following a distance model. The resulting model is compared to an extended version of the wideband E-Model. Moreover, related signal-based diagnostic models are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The application of ITU-T Rec. P.833.1 (2009) takes place within limits here, since the experimental conditions do not match the recommended reference conditions exactly.

  2. 2.

    An informal expert listening showed that there was no audible difference between these particular conditions and the clean reference.

  3. 3.

    Note that in the following, \(\{\overline{\beta }_{dim}\}\) represent the median instead of the mean values as it turned out that the median values result in slightly better goodness-of-fit measures for the model developed in this chapter. However, the median values deviate only slightly from the mean values.

  4. 4.

    In Sect. 2.4.5 it is discussed whether a logistic or a log-logistic function is the appropriate means for this transformation. A log-logistic is preferred here, because it is required that the curve goes through the origin: For the direct channel, it is required that \(I_\mathrm{tot } = I_{dim} = 0 \,\forall \, dim\). This constraint cannot be achieved with a logistic function, see Appendix A.

  5. 5.

    Whereas in Allnatt (1983, pp. 137–138), see also Sect. 2.5.3, no clear definition of the term “unrelated” could be given, the “unrelatedness” can here be assured due to the proven orthogonality of the dimensions “discontinuity”, “noisiness”, and “coloration” in Chap. 3 and the uncorrelatedness of the scales employed in the direct dimension scaling approach shown in Sect. 4.5.1.

  6. 6.

    More precisely, Eq. (5.8) is closely related to Carroll’s Model II, see Sect. 2.5.2, with \(p\equiv 2\) in Eq. (5.8). For \(p \equiv 1\), that is, in case of the City-block metric, a special case of Carroll’s vector model (Model IV, Eq. (2.16)) emerges, which was shown to be a special case of ideal-point models.

  7. 7.

    As a side effect, the “coloration” DIF can be estimated by the bandwidth impairment factor \(I_\mathrm{bw }\), which was developed for estimating \(I_\mathrm{tot }\) for linear distortions. Further details are discussed in Sect. 5.3.4.

  8. 8.

    As only data for the G.722 is available, the masking factor strictly is only valid for this codec.

  9. 9.

    The “unit” dBm is a logarithmic measure of the magnitude of a signal commonly used in telephonometry. Here, it is referred to the 0 dBr point of the network, a virtual reference point in the middle of the connection that serves as a reference to line levels (Möller 2000, p. 36). The letter ‘p’ suggests a psophometric weighting of the noise, see ITU-T Rec. O.41 (1994). Note that psophometric weighting according to ITU-T Rec. O.41 (1994) is only valid up to 6 kHz. In the experiments the model given in Eq. (5.12) is based upon, a linear extrapolation towards 8 kHz of the weighting curve was used.

  10. 10.

    Note that the noise floor, describing noise induction on the receive part of subscriber lines, can be expressed in terms of [dBmp] if not referred to the 0 dBr point. In this case, it is denoted by \(N_\mathrm{for }\). It is \(N_\mathrm{fo }=N_\mathrm{for }+RLR\), with \(RLR\) the receive loudness rating, see Appendix B.

  11. 11.

    For integral quality, the type of noise does not have an influence, see Table C.7.

  12. 12.

    Note that \(c=100\) instead of \(c=97\) was included in Raake et al. (2010). The correct formula, however, is given in ITU-T Rec. G.107.1 (2011).

  13. 13.

    In fact, \(P_\mathrm{s }\) can be derived by replacing the effective room noise \(P_{re}\) in formula (B.6) by \(P_\mathrm{s }-OLR-D_\mathrm{s }+21\).

  14. 14.

    Exceptions are the pub noise conditions of high levels included in the test that are perceived equally in “noisiness”. Obviously, for these conditions, the ambient noise was completely separated out by the listeners for their noisiness ratings, regardless of the level. A more in-depth inspection of the rating distributions reveal that histograms of pub and cafeteria noises tend to be of a bimodalstructure, with peaks at the scale extremes (not shown here). Thus, similar to the MNRU conditions discussed above, there are apparently listeners with a different “point-of-view” with regard to the assessment of the “noisiness” of noise carrying information. An inclusion of non-stationary noises into the training (see Sect. 4.3.2.2) might help to overcome this problem. The given average values are nevertheless used. Since the modeling approach is of actual importance here, the data can be replaced by statistically more certain data in the future, whereas the modeling approach remains the same.

  15. 15.

    Note that on the one hand, this noise level is clearly audible in auditory experiments (Möller 2000, p. 99) and thus detrimental for quality. On the other hand, it influences the overall model behavior to some degree (Möller 2000, p. 160).

  16. 16.

    Note that as \(RLR=0\) for the given experiments, it follows \(N_\mathrm{fo }=N_\mathrm{for }\) in Eq. (5.15).

  17. 17.

    The extracted parameters \(z_\mathrm{bw }\) and \(f_\mathrm{c }\) the bandwidth impairment factor \(I_\mathrm{bw }\) is based upon were averaged across four different speakers, each of them uttering 2 different phrases between 5 and 10 s.

  18. 18.

    It was experimented with different kinds of combinations, without finding an optimal rule for all of the tested codec combinations. The reason might be that the combination of codecs results in perceptual effects that are not consistently predictable from the perceptual dimensions of the single codecs due to the highly non-linear functioning of most codecs.

  19. 19.

    Note, however, that the data of Experiment 1 and Experiment 2 were employed to train the DNC-model. This data is unknown for the extended WB E-model.

  20. 20.

    Experiments conducted according to the direct scaling method presented in Chap. 4 as well as the method itself were not available at the time the signal-based models presented here were developed.

  21. 21.

    The labels “continuity”, “noisiness”, and “directness/frequency content” were used in Scholz (2008), based on the terminology used in Wältermann et al. (2006b). The dimensions were renamed in Wältermann et al. (2010c), see the discussion in Chap. 3.

  22. 22.

    This parameter, however, is not included in the final overall model.

  23. 23.

    In fact, substantial parts from the estimation technique developed in Scholz et al. (2006) were used in order to estimate the “coloration” DIF in Sect. 5.3.4.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcel Wältermann .

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wältermann, M. (2013). Instrumental Dimension-Based Speech Quality Modeling. In: Dimension-based Quality Modeling of Transmitted Speech. T-Labs Series in Telecommunication Services. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35019-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35019-1_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35018-4

  • Online ISBN: 978-3-642-35019-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics