Reversible audio data hiding algorithm using noncausal prediction of alterable orders
- Shijun Xiang^{1, 2}Email author and
- Zihao Li^{1}
DOI: 10.1186/s13636-017-0101-9
© The Author(s). 2017
Received: 6 August 2016
Accepted: 8 February 2017
Published: 24 February 2017
Abstract
This paper presents a reversible data hiding scheme for digital audio by using noncausal prediction of alterable orders. Firstly, the samples in a host signal are divided into the cross and the dot sets. Then, each sample in a set is estimated by using the past P samples and the future Q samples as prediction context. The order P + Q and the prediction coefficients are computed by referring to the minimum error power method. With the proposed predictor, the prediction errors can be efficiently reduced for different types of audio files. Comparing with the existing several state-of-the-art schemes, the proposed prediction model with expansion embedding technique introduces less embedding distortion for the same embedding capacity. The experiments on the standard audio files verify the effectiveness of the proposed method.
Keywords
Reversible data hiding Audio Noncausal prediction Minimum error power Alterable orders1 Introduction
Reversible data hiding technique is used for embedding data in a host signal and the host signal can be completely recovered [1]. It is used for keeping host signal such as medical images and audio files losslessly. There are two significant criterions for reversible data hiding techniques: the embedding capacity should be large while the distortion should be low. These two criterions conflict with each other. Usually, a higher embedding capacity is accompanied by a higher distortion.
Early reversible data hiding algorithms mainly focused on lossless compression. To embed data into a host signal, vacant space was made by compressing a part or even the whole host signal. Fridrich et al. proposed reversible data hiding algorithms using compression of bitplane [2] and vector state [3] for better performance. In [4], Celik et al. proposed a lossless generalized-LSB data hiding method which compressed a set of selected features from an image and embedded the payload in the space made by the compression. The type of methods usually achieved a low capacity with severe distortion.
For improving data hiding performance, Tian in [5] introduced a difference expansion (DE)-based method, in which every two pixels were grouped together to produce one high-pass coefficient and one low-pass coefficient. Then, a high-pass coefficient should be expanded to carry 1 bit. That is to say, two pixels were used to embed 1 bit. To solve the overflow and underflow problems, a location map should be used to mark the out of range pixels and embed together with the payload. Therefore, the embedding capacity is at best 0.5 bit/pixel. Tian’s method is a fundamental work of reversible data hiding and has been developed in many aspects, such as Alattar’s technique which embedded two data bits in every three pixels [6], the reduction of the size of location map [7] and the strategy to generalize DE into integer transform [8, 9].
Another type of improvement called prediction error expansion (PEE) has exceeded the DE-based methods. In these schemes, pixels were first predicted by their contexts, and the prediction error was used for data embedding through expansion. The superiority of PEE is that it can better explore the correlation to improve the prediction performance and reduce the embedding distortion. In [10], Thodi and Rodriguez proposed a histogram shifting method for embedding data in prediction errors. This paper established the foundation of PEE. Then, the two authors also proposed an improvement’s method based on difference expansion technique [11]. There are many different predictors for PEE, such as partial difference expanding (PDE) predictor [12], edge-detection mechanism (MED) [13] predictor, Gaussian weight predictor [14], or accurate predictor [15].
On the basis of DE and PEE, histogram shifting (HS) technique has been developed. HS-based scheme was first proposed by Ni et al. [16]. The significant part of the scheme was to shift the right and left bins of the peak frequency bin to make room for data embedding. Thus, the number of the peak frequency bin determines the embedding capacity. These schemes may include blocking or area selection methods just like the approach shown in [17]. Its embedding capacity was usually smaller and the embedding distortion was unstable. For bigger capacity and lower distortion, some works have combined PE with HS, such as the reference [18]. A sharper prediction-error histogram can be obtained from PE while HS can reduce embedding distortion.
For better prediction performance, Yan and Wang proposed a prediction-error expansion method using linear prediction [19] which used past eight samples for prediction and the prediction coefficients were integers and the order was fixed. In [20], Nishimura combined linear prediction method and error expansion technique that the past eight samples used to compute prediction coefficients. For exploring the correlation of the neighbor pixels/samples adequately, in [21], a non-integer prediction error expansion embedding method was proposed. In this method, the prediction value of the current sample was the mean of its two closest samples. Sachnev et al. [22] proposed a double-embedding scheme, which separated an image into two sets so that the pixels can be predicted with four immediate pixels. Hu et al. [23] presented an image data hiding scheme by using minimum rate prediction and optimized histogram modification method.
- 1)
Noncausal predictor. Due to conventional predictors of PEE which the prediction coefficients keep unchanged [19, 21, 22] or only past samples (or pixels) are used as prediction context, the redundancy can not be explored effectively [19, 20]. To answer this question, we proposed a new noncausal predictor by combining the advantages of linear predictor and conventional noncausal predictor. This predictor is designed for the double-embedding scheme in which the prediction coefficients can be adaptively calculated by minimum error power method.
- 2)
Alterable orders. Unlike conventional predictors of PEE which the prediction order is fixed [19–23] and where the prediction errors can not be effectively reduced for different audio files, the noncausal linear predictor with alterable orders is proposed in this paper. The optimal prediction order can be chosen according to the complexity of an audio file by using minimum error power method.
Owing to our improvements, the sharper prediction-error histogram can be obtained for the reduction of embedding distortion. With several standard clips, experimental results have shown that the prediction orders are different for different clips, and the best prediction performance can be achieved for a candidate file. Comparing with existing reversible audio data hiding methods, the proposed one has lower distortion at the same embedding rate.
The rest of the paper is organized as follows: the proposed scheme is described in Section II, and the experimental results in comparison with several existing excellent methods are reported in Section III. The Conclusions is in the last section.
2 The proposed scheme
This section presents the proposed noncausal prediction model in detail, which can provide satisfactory prediction accuracy for different clips. The double-embedding strategy [22] is introduced for the proposed prediction model to form the proposed high-capacity reversible data hiding scheme.
2.1 Double-embedding strategy
The double-embedding strategy has been proposed for reversible image data hiding in [22] by dividing an image into two sets like a chess board. In such a way, the pixels in a set can be predicted with its four immediate pixels in the other set. In the encoder, the first set was marked at first. In the decoder, the second set was recovered at first.
2.2 Noncausal prediction model
Where P and Q are integers, and a _{ k }(k = 1, 2, …, P + Q) are the prediction coefficients. K = P + Q is defined as the order of the prediction model in this paper.
2.3 Estimate of prediction coefficients
where dP _{ p }(p = 1, 2, …, P) is the distance between the current sample and the past p samples while dQ _{ q }(q = 1, 2, …, Q) is the distance between the current sample and the future 2q − 1 samples, L is the number of the samples in the cross or the dot set. And \( L=\left\lfloor \frac{N}{2}\right\rfloor \) where N is the length of the audio file.
After the distances have been calculated, we propose a sorting method to sort the distances. For example, if dP _{1} < dQ _{1} < dP _{3} < dP _{2} < dQ _{2} and the optimal K is 3, we let P = 2 and Q = 1. For each i, we use x _{ i − 1}, x _{ i − 3} and x _{ i + 1} to calculate the prediction coefficients. For better expression, we denote \( {x}_i^{P_1} \) as x _{ i − 1}, \( {x}_i^{P_2} \) as x _{ i − 3}, \( {x}_i^{P_3} \) as x _{ i − 2}, \( {x}_i^{Q_1} \) as x _{ i + 1} and \( {x}_i^{Q_2} \) as x _{ i + 3}. In other words, we use \( {x}_i^{P_1} \), \( {x}_i^{P_2} \) and \( {x}_i^{Q_1} \) for prediction.
Where T is the transposition operation on a matrix.
Referring to (3), there are \( 2\left\lceil \frac{P}{2}\right\rceil + Q \) samples not predicted for the computation.
After the prediction coefficients A ^{ K } are estimated, the minimum error power value with the order K can be computed by referring to (6).
2.4 The prediction order
How to compute the prediction order K is a crucial step since it plays an important role for the reduction of the prediction errors. Too small a size can not effectively explore the correlation among samples, and too large a size will bring negative effects since a sample not close to the current sample has less correlation. For different audio files, the order K may be different in order to achieve an ideal prediction accuracy. In Section II-C, we have shown that for a given order K, the minimum error power ρ ^{ K } and the corresponding coefficient set A ^{ K } can be computed for the prediction. For different order values, we can get different minimum error power values. Among all the minimum error power values, the smallest one is corresponding to the order and the prediction coefficients used for reversible data hiding.
For better description, in this work we denote K _{1} and K _{2} as the orders of the prediction coefficients in the cross set and dot set, respectively. Let \( {A}^{K_1} \) be the prediction coefficients for the cross set while \( {A}^{K_2} \) for the dot set.
2.5 Data embedding and extraction methods
After the prediction, expansion embedding combined with histogram shifting techniques proposed in [10] are applied to hide information bits reversibly. A threshold value T is defined by referring to the embedding capacity. The prediction errors in the range [−T, T] will be expanded to carry the data bits while those not in [−T, T] are shifted to make room for the expansion.
2.6 Auxiliary information
In the proposed scheme, the auxiliary information includes the threshold values (T _{1} for the cross set and T _{2} for the dot set), the prediction orders (K _{1} = P _{1} + Q _{1} and K _{2} = P _{2} + Q _{2}), and the prediction coefficients (\( {A}^{K_1} \) and \( {A}^{K_2} \)). The auxiliary information should be inserted into the cover signal for blind extraction.
- 1.
In the testing, all the samples can be used for reversible data hiding when the threshold value is bigger than 800. So, we use 20 bits to reserve the threshold values T _{1} (10 bits) and T _{2} (10 bits) since 10 binary bits can represent 1024 at most.
- 2.
We use 12 bits to reserve the values of K _{1} (6 bits) and K _{2} (6 bits). The basic reason is that in the testing the threshold value is always smaller than 64 for all the clips.
- 3.
In our testing, the prediction coefficients in magnitude are always smaller than 10. For a trade-off between prediction accuracy and embedding efficiency, all the coefficients only keep two decimal places by using rounding operation. For example, when the prediction coefficient a1 is 1.4433, it will be rounded to 1.44; when a2 is −0.3852, it is rounded to −0.39. After expanding one hundred times, we can use 11 bits to represent a coefficient (10 bits for the magnitude, 1 bit for the sign).
- 4.
In the embedding, the underflow and overflow problems have been considered by using location map. For a sample with the underflow or overflow problem, we use 25 bits to mark its position since most of the clips (44.1 kHz in duration) are not longer than 12 min. Due to the fact that the proposed prediction model has higher accuracy, there are lesser samples with the underflow and overflow problems by testing all example clips.
- 5.
Considering the auxiliary information above, in our scheme we use 12 bits to save the length of the auxiliary information, which can indicate 4096 bit of auxiliary information at most.
In the encoder, the LSB values of the first M + 12 samples are saved as part of the payload to reversibly embed into the cover signal. Here, M is the length of the auxiliary information. We use the LSB positions of the first 12 samples to record the length. Then the LSB positions of the next M samples are used to keep the auxiliary information. In the decoder, the auxiliary information is first extracted from the LSB values of the first M + 12 samples for the extraction of the hidden bits and the recovery of the cover signal.
3 Experimental results
In reversible data hiding community, embedding rate and distortion are two significant criterions. In the testing, we use signal to noise ratio (SNR) and PEAQ software to choose objective difference grade (ODG) to measure the watermark distortion of reversible data hiding schemes. The bit per sample (bps) is adopted to measure the embedding rate. The test data set includes 70 standard audio files (the wave format with the sampling rate of 44.1 kHz) [25]. Here, four clips marked by 39, 49, 64, and 66 are randomly selected as example clips for report.
The orders (K _{1}) for 4 different clips
Files | Characteristics | K _{1} |
---|---|---|
Clip39 | Piano | 2 |
Clip49 | Voice | 57 |
Clip64 | Symphony | 38 |
Clip66 | Horn | 57 |
Computational cost and decoding cost of the proposed scheme in the embedding
Clips | Clip39 | Clip49 | Clip64 | Clip66 |
---|---|---|---|---|
Duration | 2:17 | 0:22 | 0:30 | 0.17 |
Comp. cost by proposed | 49:05 | 5:20 | 15:48 | 4:12 |
Comp. cost by [20] | 0:12 | 0:06 | 0:06 | 0:06 |
Comp. cost by [21] | 0:05 | 0:01 | 0:01 | 0:01 |
Comp. cost by [22] | 0:05 | 0:01 | 0:01 | 0:01 |
Dec. cost by proposed | 0:06 | 0:01 | 0:01 | 0:01 |
Dec. cost by [20] | 0:05 | 0:01 | 0:01 | 0:01 |
Dec. cost by [21] | 0:05 | 0:01 | 0:01 | 0:01 |
Dec. cost by [22] | 0:05 | 0:01 | 0:01 | 0:01 |
4 Conclusions
The paper presents a reversible audio data hiding scheme by using noncausal prediction with alterable order. For an audio clip, the optimum order and the prediction coefficients can be achieved by using the minimum error power method. As a result, the proposed prediction model can better explore the correlation of the samples. Experimental results have shown that the proposed prediction model provides a satisfactory prediction precision for different types of clips. And, the proposed scheme (by combining the double-embedding strategy and the proposed prediction model) has lower embedding distortion for the same embedding rate in comparison with several existing excellent works.
Declarations
Funding
This work was partially supported by the NSFC project (No. 61272414), co-funded by the State Key Laboratory of Information Security (No. 2016-MS-07).
Competing interests
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
- YQ Shi, Z Ni, D Zou, C Liang, G Xuan, Lossless data hiding: fundamentals, algorithms and applications, in Proc. IEEE ISCAS, vol. 2, 2004, pp. 313–336Google Scholar
- J Fridrich, M Goljan, R Du, Invertible authentication, in Proc. SPIE Security Watermarking Multimedia Contents, San Jose, CA, 2001, pp. 197–208View ArticleGoogle Scholar
- J Fridrich, M Goljan, R Du, Lossless data embedding-new paradigm in digital watermarking. Eurosip. J. Appl. Signal Process. 2002(2), 185–196 (2002)View ArticleMATHGoogle Scholar
- MU Celik, G Sharma, AM Teklap, E Saber, Lossless generalized-LSB data embedding, in IEEE Transactions on Image Processing, vol. 14, 2nd edn., 2005, pp. 253–266Google Scholar
- J Tian, Reversible data embedding using a difference expansion, in EEE Transactions on Circuits and Systems for Video Technology, vol. 13, 8th edn., 2003, pp. 890–896Google Scholar
- AM Alattar, Reversible watermark using difference expansion of triplets, in Proc. Int. Conf. Image Process., vol. 1. Barcelona, Spain, 2003, pp. 501–504Google Scholar
- HJ Kim, V Sachnev, YQ Shi, J Nam, HG Choo, A novel difference expansion transform for reversible data embedding, in IEEE Transactions on Information Forensics and Security, vol. 4, 3rd edn., 2008, p. 465Google Scholar
- X Wang, X Li, B Yang, Z Guo, Efficient generalized integer transform for reversible watermarking, in IEEE Signal Processing Letters. 6, vol. 17, 2010, pp. 567–570Google Scholar
- F Peng, X Li, B Yang, Adaptive reversible data hiding scheme based on integer transform. Signal Process. 92(1), 54–62 (2012)View ArticleGoogle Scholar
- DM Thodi, JJ Rodriguez, Reversible watermarking by prediction-error expansion, in Proc. IEEE Southwest Symp. Image Anal. Interpretation, Lake Tahoe, CA, 2004, pp. 21–25Google Scholar
- DM Thodi, JJ Rodriguez, Expansion embedding techniques for reversible watermarking, in IEEE Transactions on Image Processing, vol. 16, 3rd edn., 2007, pp. 721–730Google Scholar
- B Ou, X Li, Y Zhao, R Ni, Reversible data hiding scheme based on pde predictor. J. Syst. Softw. 86(10), 54–62 (2012)Google Scholar
- Y Hu, HK Lee, J Li, DE-based reversible data hiding with improved overflow location map, in IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, 2nd edn., 2009, pp. 250–260Google Scholar
- C Panyindee, C Pintavirooj, Optimizations using the genetic algorithm for reversible watermarking, in Proc. ECTI-CON, 2013, pp. 1–5Google Scholar
- S Kang, HJ Hwang, HJ Kim, Reversible watermark using an accurate predictor and sorter based on payload balancing, in ETRI, vol. 34, 3rd edn., 2012, pp. 410–420Google Scholar
- Z Ni, YQ Shi, N Ansari, S Wei, Reversible data hiding, in IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, 3rd edn., 2006, pp. 354–362Google Scholar
- M. Kamran, A. Khan, and S. A. Malik, A high capacity reversible watermarking approach for authenticating images: exploiting downsampling, histogram processing, and block selection. Inf. Sci. (2013). doi: 10.1016/j.ins.2013.07.035
- WL Tai, CM Yeh, CC Chang, Reversible data hiding based on histogram modification of pixel differences, in IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, 6th edn., 2009, pp. 906–910Google Scholar
- D Yan, R Wang, Reversible data hiding for audio based on prediction error expansion, in Proc. of IIHMSP2008, 2008, pp. 249–252Google Scholar
- A Nishimura, Reversible audio data hiding using linear prediction and error expansion, in Proc. of IIHMSP2011, 2011, pp. 318–321Google Scholar
- S Xiang, Non-integer expansion embedding for prediction-based reversible watermarking, in Proc. 14th Int. Conf, 2012, pp. 224–239Google Scholar
- V Sachnev, HJ Kim, J Nam, S Suresh, YQ Shi, Reversible data embedding using sorting and prediction, in IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, 7th edn., 2009, pp. 989–999Google Scholar
- X Hu, W Zhang, X Li, N Yu, Minimum rate prediction and optimized histograms modification for reversible data hiding, in IEEE Transactions on Information Forensics and Security, vol. 10, 3rd edn., 2015, pp. 653–664Google Scholar
- AH Nuttal, Spectral analysis of a univariate process with bad data points, via maximum entropy and linear predictive techniques, in Tech. Pep. TR - 5303, Naval Underwater Systems Center, New London, Conn, 1976Google Scholar
- EBU Committee: sound quality assessment material recordings for subjective tests [online]. Available: https://tech.ebu.ch/publications/sqamcd