Reversible image watermarking, a type of digital data hiding, is capable of recovering the original image and extracting the hidden message with precision. A number of reversible algorithms have been proposed to achieve a high embedding capacity and a low distortion. While numerous algorithms for the achievement of a favorable performance regarding a small embedding capacity exist, the main goal of this paper is the achievement of a more favorable performance regarding a larger embedding capacity and a lower distortion. This paper therefore proposes a reversible data hiding algorithm for which a novel piecewise 2D autoregression (P2AR) predictor that is based on a rhombusembedding scheme is used. In addition, a minimum description length (MDL) approach is applied to remove the outlier pixels from a training set so that the effect of a multiple linear regression can be maximized. The experiment results demonstrate that the performance of the proposed method is superior to those of previous methods.
1. Introduction
Reversible or lossless data hiding techniques hide data in the host signal (for example, text, image, audio, or video) and allow for not only the extraction of the embedded message, but also the recovery of the original host signal
[1

3
]. The techniques can be applied to military or medical images that demand an exact representation of the original image. Reversible watermarking technology requires a perfect reversibility and a high performance for which a high embedding capacity and a low distortion are required.
Since the difference expansion (DE) method was introduced by Tian in 2003
[2]
, a number of algorithms for the production of far smaller difference values have been proposed
[3
,
4]
. These methods accomplish reversible watermark embedding by expanding the prediction error that is the difference between the original and predicted pixel values. Predictionerror expansion (PEE) is a generalized form of difference expansion whereby the prediction error is expanded instead of the interpixel differences. The greater the accuracy of the applied prediction method, the lower the image distortion that occurs; consequently, the prediction method is one of the most important factors for the enhancement of the performance of reversible watermarking algorithms.
The first version of a reversible watermarking predictor was proposed by Thodi and Rodriguez
[5]
. Their “JPEGLS” is a 3rdorder median edge detector (MED) predictor for which the lossless image compression standard is adopted
[6]
. MED is a contextbased adaptive predictor for which three simple predictors that are related to the three neighboring pixels of west, north, and northwest are combined. Chen et al.
[11]
compared the performances of many predictors such as the MED and the 4thorder gradientadjusted predictor (GAP) in the CALIC
[10]
. The edgedirected prediction (EDP) version that is based on piecewise 2D autoregression (P2AR)
[8]
is a predictor with leastsquared (LS) approaches for which the prediction coefficients are locally optimized inside a causal window, i.e., a training window.
Kau and Lin
[9]
proposed the edgelookahead (ELA) scheme for which LSbased prediction is used with an efficient edge detector to maximize the edgedirected characteristics. The performances of all of these predictors have been compared fairly in several papers
[9
,
10]
; basically, the higher the order of the predictor, the higher the accuracy of the prediction, and among the considered predictors, the ELA yields the best performances for most of the images
[14]
. All of these predictors are proposed for image compression whereby only the causal pixels can be used, as causality must be maintained to ensure a perfect reversibility.
In this paper, the target pixel (“TP” or “current pixel”
[13]
) is the pixel represented by
x
(
n
), and the corresponding prediction error is calculated. The LS method computes the coefficients of the neighboring support pixels (SPs) of the TP, while a window pane confines the shape and number of the SPs; for example, if four pixels are in the window, the 4th order pane is used. As stated previously, the supporting pixels should be causal to ensure reversibility, and if the original values of the pixels that are used for the prediction are retained, they are considered causal. Since the computation of the prediction error is based on the supporting pixels, the pixel values should be the same before both the embedding process and the decoding process. In image coding, the SPs consist of the pixels prior to the TP.
Advanced methods expand the LSoptimization problem in the training set (TS) by using multiple SP sets that are selected around the TS. For each TS pixel, any SS that is similar to the given support set of the TP is chosen as a member of the SP sets.
To enhance the reversible watermarking performance, the prediction does not need to be restricted to the causal pixels that are available before the TP in the scanning order. Raster scanning is the most popular order. Chen et al.
[14]
proposed a 4
^{th}
order fullcontext prediction scheme for which the four neighboring SPs (i.e., 4thorder such as north, south, east, and west pixels) of a predicted pixel are used. Using the rhombus pattern, Sachnev et al.
[1]
also proposed a 4thorder fullcontext prediction scheme that is a twostage embedding method, whereby the corresponding performance superiority confirms the salience of not only the number of SPs but also the SP locations regarding prediction accuracy.
Sachnev et al.
[1]
make a fullcontext prediction possible through an exploitation of twostage embedding called “twostage data hiding.” They split the pixels into two sets so that one of the sets can first be used to hide data, followed by the sequential use of the other set; moreover, the two sets (i.e., the dot set and the cross set) are nonoverlapping and independent of each other. When the dot set is processed first, the cross set remains intact. The cross set is processed next, and the dot set is considered intact even though it has been modified. Here, the dot set processing is performed during the first stage and the cross set processing occurs during the second stage as a result, since one of the two sets retains the original values, and the other set retains originallike pixel values, fullcontext modeling becomes possible. Due to this remarkable feature, the pixels before the TP can be causal, and while the feature can play an important role in reversible watermarking, this is not the case for image compression.
In this paper, we propose a 6thorder fullcontext predictor whereby P2AR is combined with the twostage embedding method. Contrary to the typical P2AR
[8]
and the fullcontext predictor of Chen at al.
[14]
, the predicted pixel is surrounded by six SPs (north, southeast, west, northeast, north, and west pixels) that are on the nearest location, and the TS is modified to work properly for the twostage embedding system. The proposed predictor of this paper outperforms the method of Sachnev et al.
[1]
, which was the stateoftheart algorithm. The performance of this predictor is also more effective than those of the other predictors that are used for image compression.
Numerous researchers have proposed advanced methods to improve upon the method of Sachnev et al., but they only succeeded with respect to a low embedding capacity
[7
,
15
,
21]
. Recently, Dragoi and Coltuc
[19]
proposed an advanced method that outperforms the method of Sachnev et al.
[1]
. In this paper, the goal is the proposal of another advanced method for which three kinds of new features are introduced. The first feature is the 6thorder of SPs; while Dragoi and Coltuc
[19]
used the 4thorder of SPs, higher orders cannot be used for their method due to the violation of causality. The second feature is an exploitation of the sorting that occurs during the selection process regarding the target pixels; after sorting, only the effective TS elements are used for prediction. The third feature is the shape of the TS. Dragoi and Coltuc
[19]
used a regular rectangle shape, while the proposed method uses a rectangle that is chipped on the bottom to ensure causality; as a result, the proposed method outperforms the methods of both Sachnev et al.
[1]
and Dragoi and Coltuc
[19]
.
The organization of this paper is as follows: section 2 introduces the previous works that the proposed method is based on; section 3 presents the proposed algorithm; in section 4, we present the results of our experiments to show that our algorithm is superior to the other methods; and the conclusion is presented in section 5.
2. Related Works
 2.1 Piecewise2D Autoregression (P2AR) Predictor
In this paper, we use
x
(
n
) to denote the current TP for prediction where
n
is the spatial coordinate in an image;
n
is onedimensional, but it can also be twodimensional. Even though
Fig. 1
is a twodimensional TS shape, the pixel positions are represented by a onedimensional notation. Regarding the P2AR predictor, suppose the image is scanned in a rasterscanning order, and
x
(
n
) is predicted by its past causal neighboring pixels through the use of
x
(
n
− 1) to
x
(
n
− 12) . According to the
N
thorder Markovian property, the predicted value is calculated by the
N
neighboring pixels as follows:
Twelve causal neighboring pixels around the target pixel x(n)
where
a
(
k
) is the prediction coefficient.
The P2AR predictor adapts to the local features around the target pixel on a pixelbypixel basis
[11]
; that is, the relation between each TS pixel and its SPs is very useful for the prediction of the relation between the TP and its SPs, as shown in
Fig. 2
. The relation between the TP and its SPs is expressed by
a
(
k
), which are the socalled prediction coefficients that are optimized locally inside the TS.
Training set and support pixels
A convenient TS choice is a rectangular window (i.e., in blue color) that contains
M
= 2
T
∙ (
T
+ 1) causal neighboring pixels, as shown in
Fig. 2
. In this figure, there are 12 SPs (i.e., redcolor crosses) around the TP,
x
(
n
), and since
T
is 4,
M
is 40 in the TS. Each pixel in the TS is denoted as
x_{k}
(
n
), where
n
is the position of the TP and
k
= 1, ⋯ ,
M
. Let the TS be denoted by an
M
× 1 column vector, as follows:
Each pixel in the TS has the SPs that consist of the
N
closest pixels (in red color), as shown in
Fig. 2
. Then, the pixels of
X
and their SPs would form an
M
×
N
matrix
C
, as follows:
where
x_{m}
(
n
−
k
) is the
kth
SP of the
mth
TS,
x_{m}
(
n
). A certain pixel subset is taken from among the
M
pixels of the TS for the linear regression.
The prediction coefficients are solved through the following intraTS LS optimization: min‖
X
−
Ca
‖. It is well known that the P2AR optimization comprises the following closedform solution:
where
a
= [
a
(1),
a
(2), ⋯ ,
a
(
N
)]
^{T}
comprises optimized prediction coefficients that need to be multiplied by the SP values. The latter optimization provides a solution for direction adaptation and the local approximation of the optimal prediction inside the TS.
 2.2 Minimum description length (MDL)
The selection of the sample set S out of the TS is a key issue for the prevention of model overfitting
[13]
. In the LSbased adaptive predictor for which multiple linear regressions are utilized, the similarity between the SPs of the TP and the training pixels is very important; if a number of training pixels that are not similar to the TP are included, the LSbased adaptive predictor does not provide an accurate prediction. Wu et al.
[13]
proposed the minimum description length (MDL) criterion to properly determine the SP quantity and locations regarding the predictor and for the selection of similarpatterned TS pixels. The MDL concept is applied for the proposed prediction method.
 2.3 Twostage embedding method using rhombus pattern
In
Fig. 2
, the predicted TP value is computed based on the pixels marked with red crosses. It is a wellknown fact that the nearest four neighboring pixels in the north, south, east, and west of the TP can be the most favorable candidates for prediction; however, the pixels in the east and south are not causal. The most favorable SPs therefore look like the pattern of
Fig. 2
whereby a singlestage embedding algorithm has been used.
Sachnev et al.
[1]
divided an image into two kinds of nonoverlapping pixel groups with a rhombuspatterned window, whereby two groups, the cross and dot sets of
Fig. 3
, are formed; in this figure, the four dots around a centercross form a rhombus pattern. The predicted values of the four dots are obtained using the arithmetic mean value of the four neighboring pixels around the TP, and this method is regarded as the simplest version of a fullcontext prediction; for example, a crossset pixel is predicted by the fourclosest dotset pixels, as shown in
Fig. 3
. Because of an excellent grouping, this method succeeds in the construction of a fullcontext prediction, and the corresponding improvement of the prediction performance over all of the other prediction methods, for which the SPs include only the prior region (i.e., approximately westward and northward) of the TP, is significant. Unfortunately, such predictors cannot produce a more effective performance even though highorder modeling such as those of the CALIC and P2AR methods is used
[14]
. The Sachnev et al. rhombus pattern
[1]
also allows for the use of the south and east SPs for prediction without any consideration of causality.
Twostage embedding method
 2.4 Sorting prediction errors by local variance among four neighboring pixels
In Sachnev et al.’s method
[1]
, the cross and dot sets of the rhombus scheme are independent and nonoverlapping regarding each other, meaning that it is possible to calculate the correlation parameter of each pixel of the cross set by using the neighboring four dotset pixels and vice versa. This is the
local variance value
and is calculated as follows:
where
The notation of the pixel position here is based on
Fig. 3
.
An important feature of the local variance value tends to be proportional to the magnitude of the prediction error; therefore, for some of the pixels for which the local variance is too high, the embedding of one bit of each is skipped if some pixels with small local variance values remain. Because of this property, the local variance value can also be utilized as a criterion that indicates whether the pixels are worthy of embedding or not.
3. Proposed Algorithm
Compared with the existing reversible data hiding (RDH) methods, a more accurate predictor is used to achieve the improved performance of the proposed method. The proposed idea is mainly based on the twostage embedding system and the P2AR predictor. We focus on the improvement of the prediction accuracy by ensuring the following:
• Use of two additional critical SPs over the four SPs of Sachnev et al.’s rhombus scheme
[1]
.
• Application of the P2AR predictor after the selection of the TS pixels in accordance with the MDL criterion.
• Filtration of highly improper pixels according to the local variance value whereby bits are not embedded on them.
 3.1 P2AR predictor using sixnearest support pixels
Basically, the proposed predictor is based on the twostage embedding system for which a rhombus pattern is used. Unlike Sachnev et al.’s method
[1]
, it is the 6thorder predictor that is shown in
Fig. 4
. We include the six SPs that are on the west, north, east, south, northwest, and northeast of the TP,
x
(
n
−1),
x
(
n
−2),
x
(
n
−3),
x
(
n
−4),
x
(
n
−5), and
x
(
n
−6). Because Sachnev et al.
[1]
used a sorting method, the pixels of the same group as the TP should not be exploited as an SP; that is, for the prediction of the cross pixels, only the dot pixels need to be used. If the cross pixels are used to predict the TP in the cross set and sorting is applied, it is impossible to recover the original order from the sorted pixels during the decoding process, and the reason for this is simple, as follows: Causality can be violated by using their method
[1]
due to the values of
x
(
n
− 5) and because the
x
(
n
− 6) values are not original.
Support pixels of proposed predictor (N = 6)
If it is assumed that
x
(
n
) should belong to the dot set, then
x
(
n
−1),
x
(
n
−2),
x
(
n
−3), and
x
(
n
−4) belong to the cross set and are independent of the TP
x
(
n
). Here, even though
x
(
n
−5) and
x
(
n
−6) belong to the same set as the TP, they can still be used for prediction since they are causal to the TP. As a result, six pixels can be used as an SP set.
The P2AR optimization is processed on a pixelbypixel basis. According to our method, the prediction value
for the TP
x
(
n
) can be expressed as Eq. (1). Regarding the P2AR, the prediction method locally optimizes the prediction coefficients
a
inside a causal window, which is the TS of
Fig. 5
.
A training set for prediction (in this example T = 4)
Due to the property of the twostage embedding system, a number of pixels that do not belong to the TS exist. One example of the TS (e.g., the value of
T
is 4) is presented in
Fig. 5
, whereby the TS consists of 35
M
elements. Notably, the nearrectangular form of the traditional TS, as shown in
Fig. 2
., differs from the proposed form of the TS in
Fig. 5
that looks like a chipped rectangle at the bottom; this chipped pattern is obtained because of the noncausality property of some of the SPs, and the chipped parts belong to the opposite set (i.e., cross set or dot set). The chipped parts, however, should be excluded for ensuring causality, whereby the new
M
value should be recalculated, as follows:
where
T
is an even number. In
Fig. 5
, basically all of the pixels that belong to the dot set can be included in the TS; alternatively, regarding the cross set, the TS should not include some of the crossset pixels such as
P
_{1}
,
P
_{2}
,
P
_{3}
, and
P
_{4}
even though they are causal. In the case of
P
_{5}
, it is excluded from the TS since its SPs include the TP
x
(
n
). Since the other crossset elements within the chipped rectangle do not violate causality, they are not excluded from the TS.
 3.2 Selecting proper samples from training set using MDL
The proposed method takes advantage of another idea for the selection of more suitable pixels for the purpose of improving the P2AR optimization. The TS of
Fig. 5
and
Fig. 6
are the same, but the pixels of
Fig. 6
set that is within the TS are of a dark color. In
Fig. 5
, all of the pixels in the TS can be used for the LS optimization, whereas in
Fig. 6
, only a certain number of candidates are used for the optimization. Further, other pixels are not used if it is possible that the corresponding SPs may not be similar to those of the TP. Similarity is measured according to the MDL.
Training set sample selected by MDL
In most imageblock cases, not all of the pixels in the TS are effective for the estimation of prediction coefficients because a number of the pixels are irrelevant to the TP that functions as an outlier. If those pixels affect the P2AR optimization, the accuracy of the prediction can be decreased due to overfitting (i.e., context dilution). We therefore utilize MDL as a criterion for the selection of specific pixels for which there is a sufficient correlation with the TP. Whether the correlation between the pixels of the target set and the TS is sufficient or not can be defined by the MDL criterion.
We apply the MDL criterion τ for the selection of the proper element pixels from the TS and this set is called “selective set
S
,” as follows:
where
x
(
n
−
k
) is an SP of the TP
x
(
n
),
x_{i}
(
n
−
k
) is an SP of the TP
x_{i}
(
n
), and ‖
x
‖ denotes the
L
_{1}
norm of
x
. The
D
value is used to search for a set of SPs that is similar to the target set. The selected SPs are denoted as “set
S
” as follows:
where
x
(
n
−
i
) ∈
X
such as
X
is an
M
× 1 column vector of the TS pixels, as shown in
Fig. 5
.
We select the pixels that satisfy the above condition to obtain
S
, whereby
D
(
x_{i}
(
n
)) , the summation of the absolute values of the differences between the TP’s SPs and each TP’s SPs, needs to be smaller than
τ
. The threshold value
τ
is chosen empirically as 20 to achieve a favorable performance. The minimum number of the elements of the selective set is also important because the prediction performance worsens if it is too small or too large. In this paper, the value of
T
is set as 8, meaning that
M
is 135 according to Eq. (6); therefore, the cardinality of the selective set, ‖
S
‖, is chosen empirically as 80 or more.
The vector of the prediction coefficients
a
that is obtained using the selective set
S
that satisfies the LS optimization comprises a closedform solution, as shown in Eq. (4). After the prediction coefficients have been obtained on a pixelbypixel basis, the
can be provided for the TP and the prediction errors are utilized for the embedding scheme.
The following example shows how the MDL works regarding a realimage sample: In
Fig. 6
, all of the colored pixels are TS pixels, and the center pixel with a bold font (on the 5
^{th}
column), 167, is the TP; moreover, the SPs of the 167 TP sample are shown in
Table 1
. In this example, for the TS pixel in the topleft corner, 170, its
D
(170) = ‖172 − 169‖ + ‖158 − 171‖ + ⋯ + ‖167 − 163‖ = 3 + 13 + 2 + 9 + 8 + 4 = 39 is computed according to Eq. (7) (see first row of
Table 2
). The trainingset pixels,
x_{i}
(
n
), and their SPs are shown in
Table 2
, while the
D
(
x_{i}
(
n
))of each pixel is computed and shown in the last column. Once the
D
(
x_{i}
(
n
))value of the pixels is smaller than 20, they are colored red and selected for the P2AR calculation, as shown in both
Fig. 6
and
Table 2
, whereby the bluecolored pixels are not selected since their
D
values are larger than 20.
Example of target pixel and its support pixels
Example of target pixel and its support pixels
Exemplary TS pixels and their SPs
Exemplary TS pixels and their SPs
 3.3 Encoder and decoder
While this section describes the main steps of the encoding and decoding processes, a description of the full process allows for a more explicit stepbystep expression of the proposed idea. Basically, the embedding process follows a histogramshifting method whereby two bins are selected based on their indices for the obtained histogram; that is,
T_{n}
and
T_{p}
, where
T_{n}
<
T_{p}
. In the histogram, the bin indexed as
T_{n}
and
T_{p}
is used for data hiding for which the following rules apply:
 3.1.1 Encoder
As shown in
Fig. 7
, the original image is divided into the Cross set and the Dot set for the twostage embedding. The following Crosssetembedding procedure is therefore given:
Encoder and decoder of the proposed method Step 4. Compute the original pixel value by using the recovered predictionerror values.
Step 1.
Every pixel of the Cross set is predicted by the proposed predictor. The predictionerror value
and the local variance
σ
^{2}
(
n
) are computed according to Eq. (5).
Step 2.
Virtual embedding step
to find
, which is the last variance value that satisfies the capacity requirement,
T_{n}
, and
T_{p}
.
Step 21.
Sort the pixels in the TS according to the local variance
σ
^{2}
(
n
)and assume that
is the sorted row of
E
that is a vector of the prediction errors such as
. Properly find
T_{n}
and
T_{p}
for a sufficient number of prediction errors that belong to [
T_{n}
;
T_{p}
] to satisfy the capacity of the Crossset payload
P_{c}
.
Step 22.
Virtual embedding system for the attainment of
. The local variance value
σ
^{2}
(
n
) is sorted from the smallest to the largest values. Count the number of prediction errors that produce local variance values smaller than
. Among the predictionerror values, only
P_{c}
errors are selected to hide
P_{c}
message bits. Obviously, the
P_{c}
and
values should be delivered to the decoder.
Step 3.
Real embedding step
to embed payload
Step 31.
Find the
P_{c}
pixels of which the
σ
_{2}
(
n
) values are less than
, checking in rasterscanning order from the beginning of the image; only the prediction errors of those pixels are exploited for the embedding of the payload using the histogramshift method. If overflow or underflow occurs in some of the pixels, locationmap bits are generated. The size of the location map (usually compressed) should be delivered to the decoder.
Step 32.
Side information about
T_{n}
and
T_{p}
, the payload size
P_{c}
, the local variance threshold value
and the locationmap size can be expressed in under 60 bits. The information must be embedded in the beginning of the Crossset pixels using the LSBreplacement method. The original LSB values of the 60 pixels need to be collected from the start as well; therefore, these 120 bits are embedded as side information so that this method remains completely reversible.
Step 4.
The procedure for Dotgroup embedding is the same as that from Step 1 to Step 5, for which the modified Crossset pixels are used. For the Dot set, the side information regarding
T_{n}
and
T_{p}
, the payload size
p_{d}
, the local variance threshold value
, and the locationmap size should be decided accordingly.
 3.1.2 Decoder
Here, the watermarked image is divided into the Cross set and the Dot set. The order of the twostage decoding procedure is the inverse of the order of the twostage encoding procedure; that is, the Dotset decoding proceeds first, followed by that of the Cross set.
The Dotset decoding procedure is given as follows:
Step 1.
Every pixel of the Dot set is predicted by the pr oposed predictor. Calculate the predictionerror values and local variance
σ
_{2}
(
n
)
Step 2.
From the beginning, side information is extracted from the 60 pixels.
Step 3.
Find the proper pixels with
σ
^{2}
(
n
) values that are less than
σ
^{2}
(
d
) , checking in rasterscanning order from the beginning of the image. For the
p_{d}
pixels that are found, recover the original prediction error using the decoder of the histogramshift method and extract the payload
p_{d}
. If the pixel belongs to an overflow/underflow case, follow the locationmap algorithm according to the method of Sachnev et al.
[1]
Step 4.
Compute the original pixel value by using the recovered predictionerror values.
Step 5.
Replace the first 60 LSB values with the original LSB values that are stored as side information.
Step 6.
The Crossset decoding follows the same procedure as that from Step 1 to Step 5 and the recovered Dotset pixels are used.
4. Experiment results
We used MATLAB and the following standard test images (512 × 512sized, 8bit grayscale images), which are shown in
Fig. 8
, to implement the proposed reversible data hiding method: (a)
Barbara
, (b)
Baboon
, (c)
Lena
, (d)
Sailboat
, (e)
Couple
, (f)
Pepper
, (g)
Boat
, (h)
House
, and (i)
Goldhill
. The reversibility between the original image and the recovered one is proven throughout the experiment results. We embedded a random bit stream as the watermark message and the real binary data was embedded for side information.
Nine standard test images: (a) Barbara, (b) Baboon, (c) Lena, (d) Sailboat, (e) Couple, (f) Pepper, (g) Boat, (h) House, and (i) Goldhill
We adopted a TS size
T
of 8 and an MDL value
τ
of 20 that were obtained through the conduction of extensive experiments. The peak signaltonoise ratio (PSNR) is utilized to compare the performances of the reversible datahiding methods. For comparison, the embedding capacity range is limited from 0 bit per pixel (bpp) to 1 bpp, and the proposed method is compared with the works of Sachnev et al.
[1]
, Luo et al.
[17]
, Feng et al.
[16]
, and Dragoi and Coltuc
[19]
.
The results of the comparisons with the other schemes are shown in
Fig. 9
,
Fig. 10
, and
Fig. 11
, wherein the method of Dragoi and Coltuc
[19]
and the proposed method of this paper outperform the method of Sachnev et al.
[1]
. It is obvious that the proposed method outperforms the other methods in the case of a large embedding capacity such as those greater than 0.5 bpp. As mentioned in Section 1, the stateoftheart method of Sachnev et al.
[1]
had previously been unbeatable, especially regarding a large embedding capacity. Recently, however, the method of Dragoi and Coltuc
[19]
outperformed Sachnev et al.
[1]
with respect to a large embedding capacity, positioning Dragoi and Coltuc’s method
[19]
at the center of a new reversible watermarking challenge.
PSNR and predictionerror histogram (in 10,000 bits) for comparison of Barbara, Baboon, and Lena
PSNR and predictionerror histogram (in 10,000 bits) for comparison of Sailboat, Couple, and Pepper
PSNR and predictionerror histogram (in 10,000 bits) for comparison of Boat, House, and Goldhill
For small embedding capacities, numerous methods
[7
,
15
,
21]
have achieved performances that are more effective than that of Sachnev et al.
[1]
; thus, the main goal of this paper is the achievement of an improved performance regarding a large capacity. For the smooth images that are rich in lowfrequency components, the P2AR method of Dragoi and Coltuc
[19]
is more effective, whereas the use of the proposed method achieves a more effective performance regarding complex images that are rich in highfrequency components. In the cases of Baboon,
Barbara, Pepper
, and
Goldhill
, the proposed method is more effective than any of the other methods. In the case of
Couple
, however, the P2AR method of Dragoi and Coltuc
[19]
is slightly more effective regarding a small embedding capacity. For the other images such as
Boat, House, Lena
, and
Sailboat
, the use of both methods resulted in even performances.
Sachnev et al.
[1]
utilized a rhombus scheme for which the term
full context
applies whereby it is possible to predict the TP by using the four neighboring pixels
[14]
; alternatively, a furtherenhanced prediction method involving the addition of two more SPs is exploited for the proposed idea
[1]
, meaning that the proposed method utilizes six SPs for the prediction of one TP. The TS is also exploited for the calculation of the P2ARweight coefficients. The proposed method ultimately provides an outstanding prediction performance among the existing methods.
In addition, the attainment of proper weight coefficients is more effective in the highfrequency region (i.e., around the edges); therefore, the best increasing performance occurs in
Barbara
, which has a large amount of wellordered edges. The
Barbara
performance is even superior to that of
Baboon
, the texture of which is only complicated, rather than being a specific pattern.
The other reason for a high PSNR regarding a high capacity in a highfrequency image is the locationmap size that is used for the prevention of over/underflow errors. The locationmap size of the proposed method decreases dramatically compared to that of Sachnev et al.’s method
[1]
.
Compared with Sachnev et al.
[1]
, the proposed method requires a small locationmap size. For
Barbara
, the locationmap size of Sachnev at al.
[1]
at the embedding capacity of 230,000 bits is 3,611 bits, while only 35 bits are required for the proposed method; similarly, for
Baboon
, the required locationmap sizes are 4,789 and 1,836 bits, respectively (See
Table 4
). It is obvious that the proposed method requires a smaller locationmap size for many images. The exception, however, is
Sailboat
whereby the required number of bits for Sachnev at al.
[1]
is 301 bits and that for the proposed method is 750 bits. The performance of the proposed method in terms of PSNR vs. embedding capacity is also improved since its predictor is more effective.
Comparison of kurtosis values
Comparison of kurtosis values
Comparison of locationmap sizes
Comparison of locationmap sizes
The method of Dragoi and Coltuc
[19]
also requires a small locationmap size; for
Barbara
and
Baboon
, for example, 28 and 2047 bits, respectively, are required at the same embedding capacity. Accordingly, their method and the proposed method perform far more effectively than the method of Sachnev et al.
[1]
regarding a large embedding capacity.
Table 4
and
Table 5
show how many locationmap bits are reduced regarding the proposed method; here, the creation of the location map is the same as that of Sachnev et al.’s idea
[1]
. In terms of the over/underflow problem, extremely high or extremely low intensity pixels are where the problem often occurs. A prediction improvement favorably influences the restraining of over/underflow errors.
Comparison of locationmap sizes
Comparison of locationmap sizes
The predictionerror histograms of all of the test images are shown in
Fig. 9
,
Fig. 10
, and
Fig. 11
. These figures depict the lines of the proposed idea and the idea of
[19]
as sharper than that of
[1]
. The prediction method for which P2AR is used is therefore far more accurate than
[1]
since it works more adaptively with the context pixels, while
[1]
works only with the fixed pixel values.
Kurtosis
[22]
is a measure of whether the data are peaked or flat relative to a normal distribution. The kurtosis value of the normal distribution is 3, while that of the Laplace distribution is 6, meaning that the Laplacedistribution peak and tails are sharper and thinner, respectively, than those of the Gaussian distribution. Once the kurtosis value is larger, its prediction method is more accurate.
Table 3
shows that the average kurtosis values of the proposed method and
[19]
are quite larger than that of
[1]
, and that the value of the proposed method is slightly larger than that of
[19]
, reinforcing the relatively higher prediction accuracy of the proposed method compared with the other methods.
It is, however, clear that the actual performance of the proposed method cannot be explained using a single parameter such as the kurtosis value. The performances of the reversible data hiding methods cannot be explained clearly by using either the locationmap sizes in
Table 4
and
Table 5
, the histograms of the prediction errors in
Fig. 9
,
Fig. 10
, and
Fig. 11
, or the kurtosis values in
Table 3
, since many unknown parameters affect algorithmic performances. One thing that is clear is the quite favorable functionality of the proposed method that is due to the successful predictor proposed in this paper.
5. Conclusion
In this paper, we propose an enhanced predictor that is based on a combination of PAR and a rhombusshaped twostage embedding scheme; furthermore, the use of the embedding filter according to the variance around the TP produces an effect that is the same as that from the exploitation of the sorting. Our predictor utilizes six critical fullcontext SPs through the pixels in the TS, enabling an identification of the shape of the region around the TP and the proper coefficients; therefore, a set of pixels located in a highly variative region of an image is predicted more effectively when the proposed scheme, rather than Sachnev et al.’s method, is used
[1]
. Due to this fullcontext SP utilization, the number of prediction errors decreases and the size of the location map decreases greatly; therefore, the tendency of the proposed method significantly improves the embedding capacity, especially regarding highly variative images. The experiment results demonstrate that the results of the proposed method are more favorable than those of the other methods.
For the proposed method, we utilize the fact that a combination of causal pixels, cross pixels, and a partial group of dot pixels, or dot pixels and a partial group of cross pixels, at the decoding phase can be used for prediction. For the original rhombusprediction method, only the cross pixels or the dot pixels were used for prediction because the researchers were unaware that the crossutilization of pixels is possible; in this paper, the scope of causal pixels is widened. In addition, the PAR method becomes adaptive in this paper whereby adaptability is explained as follows: among
M
candidates of the TS pixels, we can use only ‖
S
‖ candidates adaptively without the existence of side information between the encoder and the decoder by utilizing a threshold parameter
τ
.
The complexity of the proposed method, however, may be an issue. For each pixel, approximately one of the pseudoinverse matrix computations of Eq. (4) is carried out, whereby the pseudomatrix multiplication requires ‖
S
‖
^{2}
∙
N
operations; therefore, the total number of operations is approximately ‖
S
‖
^{2}
∙
N
∙
n
, where
n
is the number of total pixels. Typically, in this paper,
n
≥ 256
^{2}
, but
N
= 6 and ‖
S
‖ ≥ 80 . For a small image with
n
= 256
^{2}
, ‖
S
‖
^{2}
∙
N
≥ 80
^{2}
∙ 6 = 38,400 , which is approximately equal to
n
; however, when
n
is large, ‖
S
‖
^{2}
∙
N
is relatively negligible. In conclusion, the complexity of the proposed method is between
0
(
n
) and O(
n
^{2}
); moreover, the computation speed of the proposed method is unsurprisingly slow since the corresponding computational burden is relatively large.
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (NRF2015R1A2A2A01004587).
BIO
Byeong Yong Lee He received a B.S. degree in mathematics from Konkuk University, Korea in 2006. He received an M.S degree in mathematics from Korea University, Korea, in 2010. He received a Ph.D. degree at the Graduate School of Information Security, Korea University. His research interests are reversible watermarking, divisible loads, and steganography
Hee Joon Hwang He received a B.S. degree from the Electric and Electronic Engineering department and an M.S. degree from the Graduate School of Information Management and Security at Korea University, Seoul, Korea, in 2008. He joined the Graduate School of Information Security, Korea University, Seoul, Korea, in 2007 where he is currently pursuing Ph.D. His research interests include multimedia security, reversible and robust watermarking, and steganography.
Hyoung Joong Kim He is currently with the Graduate
Sachnev V.
,
Kim H. J.
,
Nam J.
,
Suresh S.
,
Shi Y. Q.
2009
“Reversible watermarking algorithm using sorting and prediction,”
IEEE Trans. Circuits Syst. Video Technol.
19
(7)
989 
999
DOI : 10.1109/TCSVT.2009.2020257
Tian J.
2003
“Reversible data embedding using a difference expansion,”
IEEE Trans. Circuits Syst. Video Technol.
13
(8)
890 
896
DOI : 10.1109/TCSVT.2003.815962
Alattar A.M.
2003
“Reversible watermark using difference expansion of triplets,”
IEEE Int. Conf. Image Process.
Catalonia, Spain
1
501 
504
Alattar A. M.
2004
“Reversible watermark using the difference expansion of a generalized integer transform,”
IEEE Trans. Image Process.
13
1147 
1156
DOI : 10.1109/TIP.2004.828418
Thodi D. M.
,
Rodriguez J. J.
2007
“Expansion embedding techniques for reversible watermarking,”
IEEE Trans. Image Process.
16
(3)
721 
730
DOI : 10.1109/TIP.2006.891046
Weinberger M. J.
,
Seroussi G.
,
Sapiro G.
2000
“The LOCOI lossless image compression algorithm: Principles and standardization into JPEGLS,”
IEEE Trans. Image Process.
9
(8)
1309 
1324
DOI : 10.1109/83.855427
Kang S. U.
,
Hwang H. J.
,
Kim H. J.
2012
“Reversible watermark using an accurate predictor and sorter based on payload balancing,”
ETRI J.
34
(3)
410 
420
DOI : 10.4218/etrij.12.0111.0075
Li X.
,
Orchard M. T.
2001
“Edgedirected prediction for lossless compression of natural images,”
IEEE Trans. Image Process.
10
(6)
813 
817
DOI : 10.1109/83.923277
Kau L.J.
,
Lin Y.P.
2005
“Adaptive lossless image coding using least squares optimization with edgelookahead,”
IEEE Trans. Circuits Syst.
52
(11)
751 
755
DOI : 10.1109/TCSII.2005.852194
Wu X.
,
Memon N.
1997
“Contextbased, adaptive, lossless image coding,”
IEEE Trans. Commun.
45
(4)
437 
444
DOI : 10.1109/26.585919
Wu X.
,
Barthel K.
1998
“Piesewise 2D autoregression for predictive image coding,”
Int. Conf. Image Process.
3
901 
905
Hansen M. H.
,
Yu B.
2001
“Model selection and the principle of minimum description length,”
J. Amer. Statist. Assoc.
96
(6)
746 
774
DOI : 10.1198/016214501753168398
Wu X.
,
Zhai G.
,
Yang X.
,
Zhang W.
2011
“Adaptive sequential prediction of multidimensional signals with applications to lossless image coding,”
IEEE Trans. Image Process.
20
(1)
36 
42
DOI : 10.1109/TIP.2010.2061860
Chen M.
,
Chen Z.
,
Zeng X.
,
Xiong Z.
2010
“Model order selection in reversible image watermarking,”
IEEE J. Sel. Top. Signal Process.
4
(3)
592 
604
DOI : 10.1109/JSTSP.2010.2049222
Hwang H. J.
,
Kim H. J.
,
Sachnev V.
,
Joo S. H.
2010
“Reversible watermarking method using optimal histogram pair shifting based on prediction and sorting,”
KSII, Trans. Internet Inform. Syst.
4
(4)
655 
670
Luo L.
,
Chen Z.
,
Chen M.
,
Zeng X.
,
Xiong Z.
2010
“Reversible image watermarking using interpolation technique,”
IEEE Trans. Inf. Forensics Secur.
5
187 
193
DOI : 10.1109/TIFS.2009.2035975
Ou B.
,
Li X.
,
Zhao Y.
,
Ni R.
,
Shi Y. Q.
2013
“Pairwise predictionerror expansion for efficient reversible data hiding
IEEE Trans. Image Process.
22
(12)
36 
42
Dragoi I. C.
,
Coltuc D.
2014
“Localpredictionbased difference expansion reversible watermarking,”
IEEE Trans. Image Process.
23
(4)
1779 
1790
DOI : 10.1109/TIP.2014.2307482
Wen J.
,
Lei J.
,
Wan Y.
2012
“Adaptive reversible data hiding through autoregression,”
Int. Conf. Inform. Science Tech.
831 
838
Groeneveld R. A.
,
Glen M.
1984
“Measuring skewness and kurtosis.”
The Statistician
391 
399