Compressing Audio Signals with Inpainting-Based Sparsification

Peter, Pascal; Contelly, Jan; Weickert, Joachim

doi:10.1007/978-3-030-22368-7_8

Pascal Peter¹⁷,
Jan Contelly¹⁷ &
Joachim Weickert¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11603))

Included in the following conference series:

International Conference on Scale Space and Variational Methods in Computer Vision

1037 Accesses
3 Citations

Abstract

Inpainting techniques are becoming increasingly important for lossy image compression. In this paper, we investigate if successful ideas from inpainting-based codecs for images can be transferred to lossy audio compression. To this end, we propose a framework that creates a sparse representation of the audio signal directly in the sample-domain. We select samples with a greedy sparsification approach and store this optimised data with entropy coding. Decoding restores the missing samples with well-known 1-D interpolation techniques. Our evaluation on music pieces in a stereo format suggests that the lossy compression of our proof-of-concept framework is quantitatively competitive to transform-based audio codecs such as mp3, AAC, and Vorbis.

Download conference paper PDF

Audio Inpainting Using M-Frames

Exploring Audio Compression as Image Completion in Time-Frequency Domain

Sparsity and Cosparsity for Audio Declipping: A Flexible Non-convex Approach

1 Introduction

Inpainting [23] originates from image restoration, where missing or corrupted image parts need to be filled in. This concept can also be applied for compression: Inpainting-based codecs [13] represent an image directly by a sparse set of known pixels, a so-called inpainting mask. This mask is selected and stored during encoding, and decoding involves a reconstruction of the missing image parts with a suitable interpolation algorithm. Such codecs [26, 27] can reach competitive quality to JPEG [25] and JPEG2000 [31], which create sparsity indirectly via cosine or wavelet transforms.

In lossy audio compression, all state-of-the-art codecs use a time-frequency representation of the signal and are thereby also transform-based. This applies to mp3 (MPEG layer-III) [17], advanced audio coding (AAC) [18], and the open source alternative Vorbis [34]. They resemble the classic image codecs, whereas inpainting-based compression has so far not been explored for audio data. Therefore, we propose to select and store samples directly for sparse audio representations that act as known data for inpainting.

The transition from images to audio creates some unique challenges, since visual and audio data differ in many regards. As has been shown for 3-D medical images [27], the effectiveness of inpainting-based codecs increases with the dimensionality of the input data. Audio signals only have a single time dimension, but feature a high dynamic range compared to the 8-bit standard in images. Moreover, more high-frequent changes can be expected in audio files. So far it is unknown how these differences affect the performance of interpolation and data selection strategies. In the following, we want to investigate the potential of inpainting-based audio compression.

Our Contribution. We propose a framework for lossy audio compression that is designed to transfer successful ideas from inpainting-based image compression to the audio setting. Based on this framework, we implement two proof-of-concept codecs that rely on different 1-D inpainting techniques: linear and cubic Hermite spline interpolation. Moreover, we integrate two core concepts from inpainting-based compression: sparsification [22] for the selection of known data locations, and tonal optimisation [22, 27] of the corresponding values. Our input data, music pieces in a stereo format, contain significantly more data than standard test images. Therefore, we need to adapt the optimisation techniques to the audio setting. Localised inpainting allows us to decrease computation time significantly without affecting quality. Moreover, we propose a greedy sparsification approach with global error computation instead of the stochastic, local methods common in image compression. A combination of quantisation, run-length encoding (RLE) and context-mixing for storage of the known audio data complements these optimisation strategies. We compare our new codecs to mp3, AAC, and Vorbis w.r.t. the signal-to-noise ratio.

Related Work. The reconstructing of missing image areas was first investigated by Masnou and Morel [23] who referred to this as a disocclusion problem. Later, Bertalmío et al. [4] coined the term inpainting for this application of interpolation in image restoration. Many successful contemporary inpainting operators rely on partial differential equations (PDEs), for instance homogeneous diffusion [16] or edge-enhancing anisotropic diffusion (EED) [33] inpainting. An overview of methods can be found in the monograph of Schönlieb [28]. These methods achieve a filling-in effect based on physical propagation models. Another popular approach to inpainting are exemplar-based strategies that restore missing values by nonlocal copy-paste according to neighbourhood similarities [12]. For audio, the term inpainting is rarely used. However, interpolation is an important tool for signal restoration (e.g. [20]) or synthesis (e.g. [24]). Adler et al. [2] were the first to apply the core ideas of inpainting to audio signals: They presented a framework for filling in missing audio data from a sparse representation in the time domain. Their dictionary approach relies on discrete cosine and Gabor bases. There is also a vast number of publications that deal with specific scenarios such as removal of artefacts from damaged records [20] or noise in voice recognition applications [14] that rely on interpolation and signal reconstruction in a broader sense. Since a complete review is beyond the scope of this paper, we refer to the overview by Adler et al. [2].

It should be noted that while interpolation is not widely used for audio compression, linear prediction has been applied successfully, for instance by Schuller et al. [29]. However, the core technique behind common codecs are transforms. For instance, MPEG layer-III [17] (mp3) uses a modified discrete cosine transform (MDCT) on a segmented audio signal. The sophisticated non-uniform quantisation strategies and subsequent entropy coding are augmented with psychoacoustic analysis of the signal’s Fourier transform. Advanced audio coding (AAC) [18], the successor of mp3, and the open source codec Vorbis [34] rely on the same basic principles. They also combine the MDCT with psychoacoustic modelling, but can achieve a better quality due to an increased flexibility of the encoder. A more detailed discussion of these codecs and a broader overview of the field can be found in the monograph by Spanias et al. [30].

A major competitor to transform-based audio compression arose from the so-called sinusoidal model [20, 24]. It represents an audio signal as the weighted sum of wave functions that have time-adaptive parameters such as amplitude and frequency. Sinusoidal synthesis approaches have also been applied for compression. These use interpolation, but only in the domain of synthesis parameters. Such parametric audio compression [11] also forms the foundation of the MPEG4 HILN (harmonic and individual lines plus noise) standard [19]. HILN belongs to the class of object-based audio codecs: It is able to model audio files as a composition of semantic parts (e.g. chords). Vincent and Plumbley [32] transferred these ideas to a Bayesian framework for decomposing signals into objects.

Since we use ideas from image compression to build our framework, some inpainting-based codecs are closely related to our audio approach. The basic structure of the codec is inspired by the so-called exact mask approach by Peter et al. [26]. It is one of the few codecs that allows to choose and store known data without any positional restrictions. However, we do not use optimal control to find points as in [26]. Instead, we rely on sparsification techniques that resemble the probabilistic approach of Mainberger et al. [22]. Our greedy global sparsification differs from probabilistic sparsification in a few key points: In accordance to the findings of Adam et al. [1], we use global error computation instead of localised errors. Moreover, our approach does not rely on randomisation for spatial optimisation anymore, but is a deterministic, greedy process that always yields the same known data positions. Our choice of PAQ [21] for the storage of the sample data is motivated by several works that have evaluated different entropy coding techniques for inpainting-based compression [26, 27].

Organisation of the Paper. We introduce our framework for inpainting-based audio compression in Sect. 2. Details on inpainting and data optimisation for two codecs follow in Sect. 3. These codecs are evaluated in Sect. 4, and we conclude our paper with a summary and outlook in Sect. 5.

2 A Framework for Inpainting-Based Audio Compression

Our proof-of-concept framework follows the common structure of current inpainting-based compression methods [26, 27]. An input audio file $\varvec{f}\!\!: \varOmega \subset \mathbb {N} \rightarrow \mathbb {Z}^c$ maps time coordinates $\varOmega = \{1,...,n\}$ to samples of the waveforms from $c \ge 1$ audio channels. Encoding aims to select and store a subset $K \subset \varOmega $ of known data. During decoding, inpainting uses these data for a lossy restoration of the missing samples on $\varOmega \setminus K$.

In the following, we describe the individual steps of the encoding pipeline: This includes sample selection and optimisation, as well as efficient storage with prediction and entropy coding. Our optimisation are very flexible w.r.t. the inpainting method: We only assume that a deterministic inpainting algorithm computes a reconstruction $r(K,\varvec{g})\!\!: \varOmega \rightarrow \mathbb {Z}$ from samples $\varvec{g}\!\!: K \rightarrow \mathbb {Z}$ on the set K. In Sect. 3 we discuss the actual inpainting methods for our experiments in Sect. 4. Our codec is designed to be easily extendable with other inpainting techniques, such as the dictionary-based approach of Adler et al. [2].

Step 1: Sample Quantisation. First, we apply a coarse quantisation that reduces the number of sample values to $q \ge 2$. In order to adapt to the coding pipeline from image processing, this involves a global shift to a non-negative sample range $\{0,\ldots ,p-1\}$ which is reversed again during decoding. A uniform quantisation partitions this sample range into q subintervals of length p / q, mapping to quantised values $\{0,\ldots ,q-1\}$. For inpainting, we assign the quantisation index k to the corresponding quantised value $\ell $ from the original range:

$$\begin{aligned} \ell = \Bigg \lfloor \frac{k p}{q}+\frac{1}{2} \Bigg \rfloor . \end{aligned}$$

(1)

All following optimisation steps use quantised values for inpainting and coding.

Step 2: Greedy Global Sparsification. A popular method for selecting the spatial location of known pixels in inpainting-based compression is probabilistic sparsification [22]: It starts with a full pixel mask, removes a set of randomly selected candidate points. and performs inpainting. A subset of candidates with the lowest local error are then permanently removed, since they are considered easy to reconstruct. We iterate these steps until the desired number of mask points, the target density, is reached. This method is easy to implement and supports all inpainting techniques. However, a recent analysis [1] revealed that the local error computation in this approach yields a suboptimal point selection. Therefore, we use a different sparsification strategy that relies on global error computation as proposed by Adam et al. [1]. Moreover, we remove the random component for candidate selection and obtain a greedy global sparsification that is described in Algorithm 1.

For each audio sample in the mask we compute the increase in the reconstruction error that would result from its removal. With this global reconstruction error, we sort the samples in a heap. Then we iteratively remove in every step the sample on top of the heap (i.e. with the lowest effect on the error) permanently from the mask. Afterwards, all mask samples that are affected by this removal are updated and reinserted into the heap. Which mask samples need to be updated depends on the inpainting approach (see Sect. 3). Note that, in order to avoid a costly purging of the heap in each iteration, the unmodified heap elements remain. If the sample at the top has been already removed or its error is not up-to-date, the algorithm moves on to the next one. For image compression, the global impact of individual changes in the mask cannot be considered due to runtime issues. In Sect. 3 we explain how we can reduce the computational load with an update strategy for the audio setting.

Step 3: Sample Optimisation. It is well-known in inpainting-based image compression that optimising not only the location of known data, but also the function value in the stored pixels can yield large improvements [7, 15, 22, 27]. Since we aim for a flexible framework, we use the technique from [27], as it does not require a specific inpainting technique. It performs a random walk over all mask samples: If a change to the next higher or lower quantisation level improves the reconstruction, it is kept, otherwise it is reverted. As for sparsification, we address runtime questions in Sect. 3.

Step 4: Location Encoding. Current state-of-the-art codecs [26] employ block coding in 2-D to store exact masks with unrestricted placement of known points. A natural substitute for this in 1-D is run-length encoding (RLE) [5]. We represent the mask as a sequences of ones (known samples) and zeroes (unknown samples). In sparse masks, we expect isolated ones with long runs of zeroes in-between. Therefore, we only encode runs of zeroes together with a terminating one. This allows us to store the mask as a sequence of 8bit symbols. Runs up to length 254 require only one symbol while longer runs are split accordingly (e.g. 300 is represented by 255, 45).

Step 5: Prediction and Entropy Encoding. Due to recurring patterns in audio files (in particular for music recordings), prediction can be used to achieve higher compression ratios. To this end, many publications on inpainting-based compression (e.g. [26, 27]) apply the context-mixing algorithm PAQ [21]. It predicts the next bit in a stream containing different data types according to numerous predefined and learned contexts. The weighting of these contexts adapts to the local file content with a gradient descent on the coding cost. We use PAQ for an additional joint encoding of the output data from Steps 3–4.

3 Localised Sample Optimisation with 1-D Inpainting

So far, we have not specified concrete inpainting operators for our general framework. In the following, we transfer popular inpainting techniques from image compression to the audio setting. For these inpainting operators, we develop new techniques for the acceleration of the corresponding data optimisation.

Inpainting Techniques. For our first proof-of-concept implementation of the framework, we explore the potential of successful inpainting approaches from image compression. So far, three operators have shown convincing performance [7, 13, 27]: homogeneous diffusion [16], biharmonic [10], and edge-enhancing anisotropic diffusion (EED) inpainting [33]. EED has been particularly successful, since it allows to reconstruct image edges due to a direction dependent propagation. However, due to the 1-D nature of audio data, EED is not an option.

Homogeneous diffusion inpainting keeps all of the known data points on $K \subset \varOmega $ unchanged, while the unknown data on $\varOmega \setminus K$ must fulfil the Laplace equation $\varDelta u = 0$ with $\varDelta u = \partial _{xx} u + \partial _{yy} u$. In 1-D, this implies a vanishing second order derivative, which leads to a straightforward linear interpolation between the known data points. This comes down to a minimisation of the energy

$$\begin{aligned} E_L(u) = \int _{\varOmega \setminus K} \left( u'(x)\right) ^2 dx. \end{aligned}$$

(2)

In the following sections we benefit from the compact support of the corresponding interpolation function: For the reconstruction u(x) at a location $x \in \varOmega \setminus K$ in the inpainting domain, we need a small amount of neighbouring known values $(x_{k}, u_{k})$. In the following, the indices $k=\pm 1,\pm 2,...$ denote the respective closest known samples in positive/negative x-direction. For linear interpolation, we only require the two known samples $(x_{-1}, u_{-1})$ and $(x_{1}, u_{1})$ to obtain the reconstruction

$$\begin{aligned} u(x) = \frac{x-x_{-1}}{x_{1}-x_{-1}} \, u_1 + \left( 1-\frac{x-x_{-1}}{x_{1}-x_{-1}}\right) u_{-1}. \end{aligned}$$

(3)

Biharmonic inpainting is a higher-order approach that imposes the constraint $-\varDelta ^2 u = 0$ to the inpainted data, thereby providing a smoother reconstruction compared to the homogeneous case. Cubic splines are a natural 1-D counterpart to this approach. They have been originally motivated by a physical elasticity model for draftman’s splines [9] and minimise the energy

$$\begin{aligned} E_{CS}(u) = \int _{\varOmega \setminus K} \left( u''(x)\right) ^2 dx. \end{aligned}$$

(4)

However, since we aim to reach a similar locality as for the linear interpolation, we consider a specific variant of cubic splines, the cubic Hermite spline interpolation [6] (Catmull-Rom spline). It yields an interpolant with $C^1$-smoothness using a finite support. Since it does not require equidistant sampling, it is therefore compatible with sparsification. With $\alpha := \frac{x-x_{-1}}{x_{1}-x_{-1}}$, the interpolant of cubic Hermite spline interpolation is

$$\begin{aligned} u(x)&= (2\alpha ^3 - 3 \alpha ^2 + 1) u_{-1} + (\alpha ^3-2\alpha ^2+\alpha ) \frac{u_1-u_{-2}}{x_1-x_{-2}} \nonumber \\&+ (-2\alpha ^3 + 3\alpha ^2) u_1 + (\alpha ^3 - \alpha ^2) \frac{u_{2}-u_{-1}}{x_{2}-x_{-1}}. \end{aligned}$$

(5)

For the interpolation techniques above, we round to the next 16bit integer. Note that this rounding is explicitly not restricted to the quantisation levels according to Step 3 of our compression pipeline from Sect. 2. From a very small set of quantised values, the inpainting can potentially recover a much broader sample range, if the known data is chosen appropriately. In the following, we discuss how the locality of our inpainting methods can accelerate the data optimisation.

Local Interpolation Updates. Both the greedy sparsification from Algorithm 1 and the sample optimisation from Step 3 require a global reconstruction error. Recomputing the whole reconstruction after a change of a single mask point is a significant drawback of these approaches in 2-D. However, in our 1-D audio signal setting using interpolations methods with finite support, the influence of each mask sample is limited. In the following, we always assume a sample $x_0$ is changed by an optimisation algorithm, and $x_{- 1}, x_{- 2}, ...$ denote it left mask neighbours while $x_1, x_2, ...$ are its right mask neighbours.

In a sparsification step, we remove the known sample value $y_0$ with time coordinate $x_0$ from the mask. For linear interpolation, this removal affects exactly the reconstruction of the samples $x \in (x_{-1},x_1)$, which are now reconstructed with the known data $x_{-1}$ and $x_{1}$. For the cubic Hermite spline, the situation is similar, but due to the larger support, the interval $(x_{-2},x_{2})$ is affected now. Moreover, it has to be split into three subintervals that are inpainted with different combinations of the known data $x_{-3},...,x_3$ (see Fig. 1(a)).

The update strategy for sample optimisation follows the same principle, but more subintervals need to be considered, since we now change the value $y_0$ at location $x_0$ instead of removing the sample completely. Thus, for linear interpolation, the optimisation algorithm needs to recompute the intervals $(x_{-1},x_0)$ and $(x_0,x_{1})$ with the new known sample $(x_0,y_0)$. Since the cubic Hermite spline relies on four samples, the sample change affects four intervals: $(x_{-2},x_{-1})$, $(x_{-1},x_{0})$, $(x_{0},x_{1})$, and $(x_{1},x_{2})$. Figure 1(b) illustrates the associated samples. Note that we also need the samples $x_{-3}$ and $x_3$ to compute these reconstructions.

4 Experiments

Experimental Setup. We present detailed results for a royalty-free sound file of the song Exploring the Forest [8] (linear 16bit pulse coded modulation (PCM) with 44.100 kHz sampling rate and two channels). Results for additional music pieces from a variety of genres as well as playable soundfiles are available online as supplementary material^{Footnote 1}. As a quantitative measure, we use the signal to noise ratio (SNR) that is defined by

$$\begin{aligned} SNR (f,g) = 10 \log _{10} \left( \frac{ \sum _{i=1}^n f_i^2}{\sum _{i=1}^n (f_i-g_i)^2}\right) \,. \end{aligned}$$

(6)

Comparison of Inpainting Methods. In a first experiment, we compare the performance of the two inpainting methods from Sect. 3. A quantitative SNR comparison in Fig. 2(a) reveals that linear interpolation yields almost the same SNR and outperforms the Hermite cubic spline for small compressed file sizes. The increased smoothness of the cubic spline comes at the cost of over- and undershoots close to the known samples. These can only be compensated adequately if most of the samples are known. Moreover, for linear interpolation, the best density and quantisation parameters increase proportionally to the file size. Hermite cubic spline interpolation is more sensitive in this regard. Therefore, we choose linear interpolation for our comparison to established codecs.

Comparison to Established Codecs. Our second series of experiments evaluates the compression performance of our best inpainting codec (with linear interpolation) to the established codecs mp3, Vorbis, and AAC. Our evaluation in Fig. 2(b) yields a surprising result: For medium to high compression ratios, our codec surpasses all three transform-based approaches w.r.t. the SNR. This demonstrates that concepts from inpainting-based compression can be viable in an audio setting, even with simple inpainting methods. For small compression ratios, our codec falls slightly below the SNR of AAC. Inpainting-based methods show similar behaviour for near-lossless coding of images (see e.g. [27]). This is natural, since the impact of inpainting diminishes for dense masks.

Figure 3 provides a visualisation of the inpainting results for the lowest and highest compression ratios from Fig. 2(a) with linear interpolation. On first glance, even with a low density of 10 % and a very coarse quantisation ($q=32$) the reconstructed waveform in Fig. 3(b) looks similar to the original in Fig. 3(a). However, some of the peaks are flattened (especially apparent at the end of the signal). On a temporal average, this is still close to the original signal in terms of SNR, but there are some audible artefacts like background noise. A higher density and finer quantisation leads to increasingly improved results in Fig. 3(c) and (d). Simple linear interpolation can reproduce the original waveform from carefully optimised known samples with surprising accuracy.

5 Conclusions and Outlook

Our modular framework for audio compression demonstrates the potential of inpainting with data optimisation for the sparse representation of sample data. Even with fairly simple ingredients, our proof-of-concept codecs are able to compete with established audio codecs w.r.t. quantitative analysis. In particular, this discovery is relevant for recent approaches in inpainting-based video compression [3]: Inpainting-based audio codecs would augment them in a natural way by offering a consistent way of encoding the corresponding audio tracks.

In our future work, we plan to investigate more sophisticated inpainting techniques that have been designed specifically for the audio setting (e.g. [2]), and address practical issues such as random access. Moreover, we will incorporate the psychoacoustic modelling used by transform-based codecs: Prefiltering the signal to eliminate frequencies that are unimportant for human perception might further improve the performance of inpainting-based audio compression.

Notes

1.
https://www.mia.uni-saarland.de/Publications/peter-ssvm19-supplement.zip.

References

Adam, R.D., Peter, P., Weickert, J.: Denoising by inpainting. In: Lauze, F., Dong, Y., Dahl, A.B. (eds.) SSVM 2017. LNCS, vol. 10302, pp. 121–132. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58771-4_10
Chapter Google Scholar
Adler, A., Emiya, V., Jafari, M.G., Elad, M., Gribonval, R., Plumbley, M.D.: Audio inpainting. IEEE Trans. Audio Speech Lang. Process. 20(3), 922–932 (2012)
Article Google Scholar
Andris, S., Peter, P., Weickert, J.: A proof-of-concept framework for PDE-based video compression. In: Proceedings 32nd Picture Coding Symposium (PCS 2016), Nuremberg, Germany, pp. 1–5, December 2016
Google Scholar
Bertalmío, M., Sapiro, G., Caselles, V., Ballester, C.: Image inpainting. In: Proceedings SIGGRAPH 2000, New Orleans, LI, pp. 417–424, July 2000
Google Scholar
Capon, J.: A probabilistic model for run-length coding of pictures. IRE Trans. Inf. Theor. 5(4), 157–163 (1959)
Article MathSciNet Google Scholar
Catmull, E., Rom, R.: A class of local interpolating splines. In: Barnhill, R.E., Riesenfeld, R.F. (eds.) Computer Aided Geometric Design, pp. 317–326. Academic Press, New York (1974)
Chapter Google Scholar
Chen, Y., Ranftl, R., Pock, T.: A bi-level view of inpainting-based image compression. In: Proceedings 19th Computer Vision Winter Workshop, Křtiny, Czech Republic, pp. 19–26, February 2014
Google Scholar
Crowley, P.: Exploring the Forest (2013). Audio file, available under http://petercrowleyfantasydream.jimdo.com
de Boor, C.: A Practical Guide to Splines, Applied Mathematical Sciences, vol. 27. Springer, New York (1978)
Book Google Scholar
Duchon, J.: Interpolation des fonctions de deux variables suivant le principe de la flexion des plaques minces. RAIRO Analyse Numérique 10(3), 5–12 (1976)
Article MathSciNet Google Scholar
Edler, B., Purnhagen, H.: Parametric audio coding. In: Proceedings International Conference on Communication Technology Proceedings (WCC-ICCT 2000), vol. 1, Beijing, China, pp. 614–617, August 2000
Google Scholar
Efros, A.A., Leung, T.K.: Texture synthesis by non-parametric sampling. In: Proceedings Seventh IEEE International Conference on Computer Vision, vol. 2, Corfu, Greece, pp. 1033–1038, September 1999
Google Scholar
Galić, I., Weickert, J., Welk, M., Bruhn, A., Belyaev, A., Seidel, H.P.: Image compression with anisotropic diffusion. J. Math. Imaging Vis. 31(2–3), 255–269 (2008)
Article MathSciNet Google Scholar
Gemmeke, J.F., Van Hamme, H., Cranen, B., Boves, L.: Compressive sensing for missing data imputation in noise robust speech recognition. IEEE J. Sel. Top. Sign. Process. 4(2), 272–287 (2010)
Article Google Scholar
Hoeltgen, L., et al.: Optimising spatial and tonal data for PDE-based inpainting. In: Bergounioux, M., Peyré, G., Schnörr, C., Caillau, J.P., Haberkorn, T. (eds.) Variational Methods in Image Analysis, De Gruyter, Berlin, pp. 35–83 (2017)
Google Scholar
Iijima, T.: Basic theory on normalization of pattern (in case of typical one-dimensional pattern). Bull. Electrotech. Lab. 26, 368–388 (1962). (in Japanese)
Google Scholar
ISO/IEC: Information technology - coding of moving pictures and associated audio - part 3: Audio, Standard, ISO/IEC 11172–3 (1992)
Google Scholar
ISO/IEC: Information technology - generic coding of moving pictures and associated audio - part 7: Avanced audio coding, Standard, ISO/IEC 13818–7 (1992)
Google Scholar
ISO/IEC: Information technology - coding of audio-visual objects - part 3: Audio, standard, ISO/IEC 14496–3 (2001)
Google Scholar
Maher, R.: A method for extrapolation of missing digital audio data. J. Audio Eng. Soc. 42(5), 350–357 (1994)
Google Scholar
Mahoney, M.: Adaptive weighing of context models for lossless data compression. Technical report, CS-2005-16, Florida Institute of Technology, Melbourne, FL, December 2005
Google Scholar
Mainberger, M., et al.: Optimising spatial and tonal data for homogeneous diffusion inpainting. In: Bruckstein, A.M., ter Haar Romeny, B.M., Bronstein, A.M., Bronstein, M.M. (eds.) SSVM 2011. LNCS, vol. 6667, pp. 26–37. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-24785-9_3
Chapter Google Scholar
Masnou, S., Morel, J.M.: Level lines based disocclusion. In: Proceedings 1998 IEEE International Conference on Image Processing, vol. 3, Chicago, IL, pp. 259–263, October 1998
Google Scholar
McAulay, R., Quatieri, T.: Computationally efficient sine-wave synthesis and its application to sinusoidal transform coding. In: Proceedings International Conference on Acoustics, Speech, and Signal Processing (ICASSP-88), New York, NY, pp. 370–373, April 1988
Google Scholar
Pennebaker, W.B., Mitchell, J.L.: JPEG: Still Image Data Compression Standard. Springer, New York (1992)
Google Scholar
Peter, P., Hoffmann, S., Nedwed, F., Hoeltgen, L., Weickert, J.: Evaluating the true potential of diffusion-based inpainting in a compression context. Sign. Process. Image Commun. 46, 40–53 (2016)
Article Google Scholar
Schmaltz, C., Peter, P., Mainberger, M., Ebel, F., Weickert, J., Bruhn, A.: Understanding, optimising, and extending data compression with anisotropic diffusion. Int. J. Comput. Vis. 108(3), 222–240 (2014)
Article MathSciNet Google Scholar
Schönlieb, C.B.: Partial Differential Equation Methods for Image Inpainting, Cambridge Monographs on Applied and Computational Mathematics, vol. 29. Cambridge University Press, Cambridge (2015)
Book Google Scholar
Schuller, G.D., Yu, B., Huang, D., Edler, B.: Perceptual audio coding using adaptive pre-and post-filters and lossless compression. IEEE Trans. Speech Audio Process. 10(6), 379–390 (2002)
Article Google Scholar
Spanias, A., Painter, T., Atti, V.: Audio Signal Processing and Coding. Wiley, Hoboken (2006)
Google Scholar
Taubman, D.S., Marcellin, M.W. (eds.): JPEG 2000: Image Compression Fundamentals, Standards and Practice. Kluwer, Boston (2002)
Google Scholar
Vincent, E., Plumbley, M.D.: Low bit-rate object coding of musical audio using Bayesian harmonic models. IEEE Trans. Audio Speech Lang. Process. 15(4), 1273–1282 (2007)
Article Google Scholar
Weickert, J.: Theoretical foundations of anisotropic diffusion in image processing. In: Kropatsch, W., Klette, R., Solina, F., Albrecht, R. (eds.) Theoretical Foundations of Computer Vision, Computing Supplement, vol. 11, pp. 221–236. Springer, Vienna (1996). https://doi.org/10.1007/978-3-7091-6586-7_13
Chapter Google Scholar
Xiph.Org Foundation: Vorbis I specification (2015). https://xiph.org/vorbis/doc/Vorbis_I_spec.html

Download references

Acknowledgements

We thank Jan Østergaard (Aalborg University) for the valuable discussions that allowed us to improve our work. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 741215, ERC Advanced Grant INCOVID).

Author information

Authors and Affiliations

Mathematical Image Analysis Group, Faculty of Mathematics and Computer Science, Saarland University, Campus E1.7, 66041, Saarbrücken, Germany
Pascal Peter, Jan Contelly & Joachim Weickert

Authors

Pascal Peter
View author publications
You can also search for this author in PubMed Google Scholar
Jan Contelly
View author publications
You can also search for this author in PubMed Google Scholar
Joachim Weickert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pascal Peter .

Editor information

Editors and Affiliations

University of Lübeck, Lübeck, Germany
Jan Lellmann
University of Erlangen-Nuremberg (FAU), Erlangen, Germany
Martin Burger
University of Lübeck, Lübeck, Germany
Jan Modersitzki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peter, P., Contelly, J., Weickert, J. (2019). Compressing Audio Signals with Inpainting-Based Sparsification. In: Lellmann, J., Burger, M., Modersitzki, J. (eds) Scale Space and Variational Methods in Computer Vision. SSVM 2019. Lecture Notes in Computer Science(), vol 11603. Springer, Cham. https://doi.org/10.1007/978-3-030-22368-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-22368-7_8
Published: 05 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22367-0
Online ISBN: 978-3-030-22368-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Compressing Audio Signals with Inpainting-Based Sparsification

Abstract