Conditional Re-encoding Method for Cryptanalysis-Resistant White-Box AES

ETRI Journal.
2015.
Oct,
37(5):
1012-1022

- Received : January 27, 2014
- Accepted : August 07, 2015
- Published : October 01, 2015

Download

PDF

e-PUB

PubReader

PPT

Export by style

Article

Metrics

Cited by

TagCloud

Conventional cryptographic algorithms are not sufficient to protect secret keys and data in white-box environments, where an attacker has full visibility and control over an executing software code. For this reason, cryptographic algorithms have been redesigned to be resistant to white-box attacks. The first white-box AES (WB-AES) implementation was thought to provide reliable security in that all brute force attacks are infeasible even in white-box environments; however, this proved not to be the case. In particular, Billet and others presented a cryptanalysis of WB-AES with 230 time complexity, and Michiels and others generalized it for all substitution-linear transformation ciphers. Recently, a collision-based cryptanalysis was also reported. In this paper, we revisit Chow and others’ first WB-AES implementation and present a conditional re-encoding method for cryptanalysis protection. The experimental results show that there is approximately a 57% increase in the memory requirement and a 20% increase in execution speed.
^{38}
possible key combinations with respect to a 128-bit key size. Cracking the key using a supercomputer would require an average of 1.02 × 10
^{18}
years of testing at a rate of 10.51 × 10
^{15}
floating point operations per second. For this reason, if the secret key is kept secure, then it is also assumed to be secure.
Most cryptographic algorithms are designed to protect a secret key against what is known as a “black-box attack.” In a black-box attack model, an attacker has knowledge of the cryptographic algorithm and is able to examine various inputs and outputs, but is unable to monitor its execution or any intermediate results generated during the computation. Based on these conditions, a black-box attacker can mount various types of attacks including known-plaintext, ciphertext-only, chosen-ciphertext, and adaptive chosen-plaintext attacks.
While these attacks seem to be powerful enough in a black-box attack model, the physical properties of cryptosystems mean software is vulnerable to advanced attacks. Often, if an attacker has sufficient access to a target cipher to mount adaptive attacks, then they may obtain some useful information about the execution of an algorithm and its secret key. For example, side-channel attacks, such as differential power analysis (DPA)
[1]
or correlation power analysis (CPA)
[2]
, enable an attacker to crack smart cards protected with cryptographic algorithms in a matter of minutes rather than the theoretical billion years of testing. This type of attack exploits the fact that the power consumption of a cryptographic device at any given point in time strongly depends on the data it processes and the operation it performs. Provided that enough power traces are given, an attacker can perform a statistical analysis and discover a secret key very quickly
[3]
. Another type of side-channel attack is realized through maliciously injecting faults into a cryptographic device and observing the corresponding erroneous outputs; this is a so-called fault injection attack. In some extreme cases such as the RSA algorithm using the Chinese Remainder Theorem (CRT-RSA), even a single fault injection enables an attacker to reveal a secret key
[3]
–
[4]
.
Side-channel attacks, however, are characterized as a “grey-box attack” because an attacker has access to a small part of the execution. From this standpoint, the countermeasures against a grey-box attack do not fully reflect reality. A much more powerful type of attack is a “white-box attack.” In this attack model, an attacker has total visibility into an execution platform and the software implementation of an algorithm. This means that an attacker can analyze and tamper with the binary code and corresponding memory page of the application during execution. To this end, an attacker may use an assistant attack tool such as a disassembler or debugger. Unfortunately, most cryptographic algorithms, including those for digital signatures, authentication, smart cards, and the like, are vulnerable to white-box attacks. There is therefore a need for white-box cryptography in any software protection strategy. The protection of AES against a white-box attack (WB-AES) was started in
[5]
with the addition of cryptanalysis for this technique
[6]
, and continuous research on WB-AES has followed
[7]
–
[10]
.
Overall, each WB-AES algorithm has a corresponding cryptanalysis of its weakness. There is therefore a need for a secure WB-AES algorithm that is resistant to cryptanalysis.
In this paper, we revisit the vulnerabilities of Chow’s WB-AES
[5]
and improve it in such a way so as to protect against a white-box attack, with low additional costs. Our solution changes a portion of the encoding method in addition to the construction of a specific type of lookup table. This new encoding method is the key idea behind the protection of previous white-box attacks. Compared to Chow’s WB-AES, the additional costs are 1.57-times the size of the lookup tables, and 1.2-times the runtime with an Intel Core i7 at 3.4 GHz.
The rest of this paper is organized as follows. Section II briefly reviews the WB-AES algorithm proposed by Chow and others. In Section III, we describe the previous cryptanalysis. We then propose our WB-AES algorithm, which is resistant to cryptanalysis, and analyze its security and performance in Section IV. Finally, we provide some concluding remarks in Section V.
Here, we know the following
[5]
,
[12]
:
These observations give us the following description by which AES is converted into a table-based implementation.
S
(
x
) is an S-box with an 8-bit input value,
x
, and T-boxes for round 10 have to absorb two round keys,
k
^{10}
, where
x
_{0}
,
x
_{1}
,
x
_{2}
, and
x
_{3}
be four bytes to be multiplied by MC. The multiplication can be decomposed as follows:
y
_{0}
,
y
_{1}
,
y
_{2}
, and
y
_{3}
denote the four terms on the right-hand side. The so-called
Ty_{i}
tables that map 8-bits to 32-bits are defined as follows:
x
_{0}
,
x
_{1}
,
x
_{2}
, and
x
_{3}
by MC can be computed by four table lookups and three XORs:
Ty
_{0}
(
x
) ⊕
Ty
_{1}
(
x
) ⊕
Ty
_{2}
(
x
) ⊕
Ty
_{3}
(
x
). In rounds 1 to 9, 144
Ty_{i}
tables are needed to accept the 8-bit outputs of T-boxes.
^{1)}
on its input and output, and affine mixing bijections applied to all key-dependent lookup tables. The input and output encodings achieve
confusion
, as defined by Shannon
[13]
, while mixing bijections achieve
diffusion
. In the Type II tables, 8 × 8 mixing bijections are used to diffuse the input to T-boxes, and 32 × 32 mixing bijections, MB, are inserted after MixColumns. A trapezoid that is labeled “32 × 8” in Type II shown in
Fig. 1
indicates a result of applying the mixing bijections to the
Ty_{i}
tables. The Type III tables combine MB
^{−1}
with the inverse of the input 8 × 8 mixing bijections for T-boxes of the next round. To reduce the size of lookup tables, MB
^{−1}
is also divided into four submatrices,
Type II, III, and IV tables (from left).
x
,
y
) =
x
⊕
y
.
There are two variations of Type IV in WB-AES — one to XOR (and combine) 4-bit chunks for a 32-bit result of Type II and Type III tables, and the other for a 128-bit result of Type IA and Type IB tables. Each 4-bit input has to be decoded and the 4-bit output has to be encoded non-linearly. The XOR operation of 4-bit chunks using Type IV tables is given below for illustration purposes
[14]
:
The resulting value of E2(
x
_{0}
⊕
x
_{1}
⊕
x
_{2}
⊕
x
_{3}
) can be computed in the following way:
x
_{0}
⊕
x
_{1}
⊕
x
_{2}
⊕
x
_{3}
). Type II, III, and IV tables are depicted in
Fig. 1
.
Type IA and IB tables.
Sequence of table lookups.
For later use, we distinguish Type IV following Type II and Type III tables as Type IV_II and Type IV_III tables, respectively. For compatibility with an original AES, the input to Type IA should be first encoded by the corresponding linear and non-linear transformations. On the other hand, the output of round 10 should be decoded by the corresponding non-linear and linear transformations.
All encoding tables have to be deleted after generating the lookup tables. All we need is a set of lookup tables for encryption/decryption.
r
∈ [0, 9]. Using this
y_{i}
of (
x
_{0}
,
x
_{1}
,
x
_{2}
,
x
_{3}
) as follows:
(1) $$\begin{array}{ll}{y}_{i}({x}_{0},{x}_{1},{x}_{2},{x}_{3})=\hfill & {Q}_{i}^{r}({\alpha}_{i,0}{T}_{0}^{r}\left({P}_{0}^{r}({x}_{0})\right)\oplus {\alpha}_{i,1}{T}_{1}^{r}\left({P}_{1}^{r}({x}_{1})\right)\hfill \\ \hfill & \oplus {\alpha}_{i,2}{T}_{2}^{r}\left({P}_{2}^{r}({x}_{2})\right)\oplus {\alpha}_{i,3}{T}_{3}^{r}\left({P}_{3}^{r}({x}_{3})\right)),\hfill \end{array}$$
One of four ${R}_{j}^{r}$ mappings, j = 0, … , 3 [6] .
where
α_{i,j}
are the coefficients of MixColumns. If
x
_{1}
,
x
_{2}
, and
x
_{3}
are fixed to certain constants, say
c
_{1}
,
c
_{2}
, and
c
_{3}
, respectively, then we have
(2) $${y}_{0}(x,{c}_{1},{c}_{2},{c}_{3})={Q}_{0,j}^{r}\left(\alpha {T}_{0,j}^{r}\left({P}_{0,j}^{r}(x)\right)\oplus {\beta}_{\text{c}}\right).$$
Since
x
only takes 256 values and constant
β
_{c}
is fixed, these mappings are known by the input/output, as well as by their inverses. Changing one of the constants,
c
_{1}
,
c
_{2}
, or
c
_{3}
, for example,
c
_{1}
,
c
_{2}
, and
(3) $$\begin{array}{l}{y}_{0}(x,{c}_{1},{c}_{2},{c}_{3})\circ {y}_{0}{\left(x,{c}_{1},{c}_{2},{c}_{3}^{\prime}\right)}^{-1}\\ \text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}=\text{\hspace{0.17em}}{Q}_{0,j}^{r}\text{}\left(\alpha {T}_{0,j}^{r}\left(\text{}{P}_{0,j}^{r}\left(\text{}{P}_{0,j}^{r}{}^{-1}\text{}\left({T}_{0,j}^{r}{}^{-1}\text{}\left({\alpha}^{-1}\text{}({Q}_{0,j}^{r}{}^{-1}\text{\hspace{0.05em}}\oplus \text{\hspace{0.05em}}{\beta}_{\text{c\u2032}})\right)\right)\right)\right)\text{\hspace{0.17em}}\oplus \text{\hspace{0.05em}}{\beta}_{\text{c}}\right)\\ \text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}=\text{\hspace{0.17em}}{Q}_{0,j}^{r}\left(({Q}_{0,j}^{r}{}^{-1}\oplus {\beta}_{\text{c\u2032}})\oplus {\beta}_{\text{c}}\right),\\ \text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}=\text{\hspace{0.17em}}{Q}_{0,j}^{r}({Q}_{0,j}^{r}{}^{-1}\oplus \beta ),\end{array}$$
where
β
=
β
_{c}
⊕
β
_{c}
′ takes all values in GF(2
^{8}
). Theorem 1 (below) then enables an attacker to recover the non-linear part
r
∈ [1, 9] with time complexity 2
^{24}
. Because there is no MixColumns step, the last round is excluded from the cryptanalysis. The open dot “ ◦ ” means a composition of functions.
Theorem 1.
Given a set of functions
S
= {
Q
◦ ⊕
_{β}
◦
Q
^{−1}
}
_{β}
, where
β
∈ GF(2
^{8}
),
Q
is a permutation of GF(2
^{8}
), and ⊕
_{β}
is the translation by
β
in GF(2
^{8}
), one can construct a particular solution,
A
, so that
Q
◦
A
.
The next step is to recover the affine maps
A
_{0}
⊕
q
_{0}
, where
A
_{0}
is linear and
q
_{0}
is a constant. To this end, (1) is used to determine a unique mapping,
L
, and a constant,
c
, such that for all
x
_{0}
⊕ GF(2
^{8}
) we have
y_{i}
(
x
_{0}
, 0, 0, 0 ) =
L
(
y_{j}
(
x
_{0}
, 0, 0, 0)) ⊕
c
.
The time complexity to solve this is much lower than 2
^{16}
with a highly over-defined linear system of 2
^{8}
× 8 equations, involving the 64 entries of
L
as well as the eight entries of
c
as unknowns over GF(2), using the knowledge of the functions
y_{i}
and
y_{j}
based on their values.
It is then possible to recover
A
_{0}
, the linear part of
Q
_{0}
, with time complexity 2
^{24}
by determining the characteristic polynomial of
L
. At the same time, the constant part
q
_{0}
of
Q
_{0}
is also computed using the knowledge of
α_{i,j}
by setting the four variables in (1) to zero. It was shown that the linear parts of
Q
_{1}
,
Q
_{2}
, and
Q
_{3}
can be directly computed from the knowledge of the linear part of
Q
_{0}
with time complexity 2
^{16}
. In a similar way, all
^{30}
. In
[16]
, Tolhuizen improved it to 2
^{22}
using a preprocessing step.
To be more specific, a generic substitution-affine transformation (SAT) cipher is defined in the second step, where a round consists of T-boxes
ϵ^{r}
. Because MixColumns, the affine function of AES, are known to an attacker,
ϵ^{r}
can be computed. Biryukov and others’ algorithm
[17]
is used to solve the equivalence problem
T_{i}
=
γ_{i}
◦
S_{i}
◦
δ_{i}
, where
γ
and
δ
are the affine functions that describe the affine relation between
T_{i}
and
S_{i}
. Since
T_{i}
is a part of the SAT cipher, it contains the key
k^{r}
. Note that this cryptanalysis also begins with Theorem 1 to remove the non-linear part.
S
_{0}
,
S
_{1}
,
S
_{2}
, and
S
_{3}
and associated key bytes using the equation
(4) $$\begin{array}{l}{Q}_{0}(02\otimes {S}_{0}(\alpha )\oplus 03\otimes {S}_{1}\left(0\right)\oplus \text{c})\\ \begin{array}{cc}& ={Q}_{0}(02\otimes {S}_{0}\left(0\right)\oplus 03\otimes {S}_{1}(\beta )\oplus \text{c}).\end{array}\end{array}$$
As they mentioned, there are 256 pairs (
α
,
β
) with a trivial solution (0, 0). The attacker constructs a linear system to recover
S
_{0}
and
S
_{1}
. After
S
_{0}
is recovered,
S
_{1}
can be recovered through an exhaustive search. The remaining functions,
S
_{2}
and
S
_{3}
, are then recovered similarly by solving the linear systems. Once the
S_{i}
functions have been recovered, one can easily recover
Q_{i}
in the output of the first round. The total time complexity of the attack is 2
^{22}
.
Lepoint and others’ attack [10] .
For the cryptanalysis above, we can see through Theorem 1 that the first two methods of cryptanalysis used and the collision equation of the last method are both dependent on
Q
. More precisely, their assertion assumes that for
x
,
y
∈ GF(2
^{8}
), if
x
=
y
, then
Q
(
x
) =
Q
(
y
); and if
x
≠
y
, then
Q
(
x
) ≠
Q
(
y
).
What is important here is that intermediate values are combined by XORs, and the non-linear encoding is then followed in the 4-bit unit. In the following section, based on this fact, we propose a conditional re-encoding method and show that it can protect against three methods of cryptanalysis.
A. Round Output Encoding
Let
δ
denote the non-linear encoding function for the round output. Given
x
,
y
∈ {0, 1}
^{4}
and
z
=
x
⊕
y
,
δ
depends on
x
,
y
, and
z
instead of only
z
. This is as follows:
This conditional re-encoding depending on the value of
ϕ
creates a many-to-many relationship between
z
and
z
′. To be more precise, given
x
1,
x
2,
y
1,
y
2 ∈ {0, 1}
^{4}
such that
x
1 ⊕
y
1 =
z
1 and
x
2 ⊕
y
2 =
z
2, let us assume that
ϕ
(
x
1,
y
1) = 1 and
ϕ
(
x
2,
y
2) = 0. We then know that
δ
(
x
1,
y
1,
z
1) ≠
δ
(
x
2,
y
2,
z
2). For this reason,
z
′ requires the value of
ϕ
(
x
,
y
) to be correctly decoded. Because the value of
ϕ
(
x
,
y
) determines whether a re-encoding takes place,
δ
is a function of
ϕ
(
x
,
y
) and
z
.
B. Round Input Decoding
To properly decode
z
′ back to
z
, we need to keep track of the occurrence of the re-encoding in a satellite value. We denote the satellite variable by
γ
, and its value is determined by the value of
ϕ
(
x
,
y
). If
γ
= 1, then it means that
z
′ = E3(E2(
z
)); otherwise,
z
′ = E2(
z
). The input decoding
δ
^{−1}
of the next round is then a function of
z
′ and
γ
as follows:
C. Type IV_III Generation and Lookup
A sub-byte of a round output is computed using three XOR operations combining four intermediate values, and we denote each intermediate result by pair2, pair1, and pair0, as in Section II. Without having to apply the conditional re-encoding to all pairs, we apply it only when pair0 is computed for simplicity. Thus, a Type IV_III table should reflect the conditional re-encoding based on the characteristics of the two 4-bit inputs of pair0 when it is generated (see Appendix).
To properly look up a Type IV_III table, we must consider that all encodings and XOR operations are performed in the 4-bit unit, and
γ
needs to be two bits to track the re-encodings of the upper and lower 4-bit chunks separately; in each round, the total size of
γ
is then 16 × 2 bits. The satellite variable
γ
provides no additional secret information to the attacker because the two inputs are encoded values.
D. Type II Generation and Lookup
To be properly cooperated with the modified Type IV_III tables, the Type II tables must also be modified to correctly decode the input. The conditional re-encoding is independently applied to the upper and lower 4-bit output of a Type IV_III table. For this reason, a Type II table contains four decoding cases, as shown in
Table 1
. To look up a correctly decoded input for a Type II table,
γ
has to be referred. Based on
Table 1
,
γ
can be used as an index to select the decoded value.

Q
. Specifically, Theorem 1 basically uses
Q
^{−1}
◦
Q
(
z
) =
z
and the collision equation uses the fact that
z
=
z
′ if
Q
(
z
) =
Q
(
z
′). Owing to the many-to-many relationship by the conditional re-encoding, Theorem 1 and the collision equation are not acceptable for our proposed scheme. The details are as follows.
A. Against Theorem 1
When we say
Q
^{−1}
◦
Q
(
z
) =
z
, this means that the output of the non-linear encoding is determined by only its 8-bit value. To be more precise, a particular 8-bit
z
at the same round, row, column, and pair has to be encoded to a particular output
z
′. In addition,
z
′ has to be decoded to
z
. Based on the property of the conditional re-encoding, this is not always guaranteed.
Let us recall (3). An attacker assumed that the table for the non-linear encoding at the same round, row, column, and pair is always the same. However, we know that the non-linear encoding table for the round output is
x
and
y
are 4-bit inputs to be XORed. Given a particular round, row, column, and pair0, we need to conduct the XOR operations between 8-bit inputs and encode the result for the round output. The following examples illustrate how this disturbs Theorem 1.
Table 2
explains the notation to be used.

Example 1.
X
= 0x11,
Y
= 0x34, E2_high (0x2) = 0xA, E2_low (0x5) = 0x1, and E3_high (0xA) = 0x7, E3_low (0x1) = 0xC. The decision function
ϕ
outputs 1 if the LSB of the first input and the second LSB of the second input are different. The following computation is then depicted in
Fig. 6
, and we know that
Z
1 has to be re-encoded.
Type IV_III lookups for XOR operations.
Example 2.
X
= 0xB7,
Y
= 0x92, and the other conditions are the same; thus, we have
Z
2 has to be re-encoded.
Example 3.
X
= 0x6C,
Y
= 0x41, E2_low (0xD) = 0xC, and the other conditions are the same; thus, we have
Z
1 and
Z
2 are the same.
In contrast, the last two examples showed that the two encoded outputs can be the same even though
Z
2 and
Z
3 are different. The decoding for the next round’s input also has a many-to-many relationship because the input decoding is determined by
γ
. For this reason, a successful cryptanalysis using Theorem 1 is possible if there is
always no re-encoding
or
always a re-encoding
. According to (3) and Theorem 1, an attacker needs
S
= {
Q
◦ ⊕
_{β}
◦
Q
^{−1}
}(
x
), where
x
∈ GF(2
^{8}
). Because we know that the probability that γ = ‘00’ is 1/4, the probability of “
always no re-encoding
” happening is (1/4)
^{256}
. Similarly, the probability of “
always a re-encoding
” is also (1/4)
^{256}
. Thus, there is a negligible probability of constructing a correct
S
for an attacker who has one of the assumptions, “
always no re-encoding
” or “
always a re-encoding
.” Without a correct set of functions
S
, an attacker is unable to construct a particular solution
n
bits of
x_{k}
and
y_{k}
, where
k
= 0 or 1, to test the condition for the re-encoding, then there is a 1/2
^{2n}
probability of a re-encoding not happening. Therefore, if we compare four bits of
x_{k}
and
y_{k}
, a sub-byte of a round output is re-encoded with a 255/256 probability. In this case, the attacker can be convinced that the output was probably encoded by the same table, E3 ◦ E2. The problem then returns to its starting point. For this reason, we recommend comparing a single bit between
x_{k}
and
y_{k}
. In addition, one might be curious if the many-to-many relationship of the non-linear encoding lowers the security level of WB-AES. However, the encoding becomes complex because the number of encoding cases is significantly increased.
B. Against Collision Attacks
Lepoint and others’ cryptanalysis needs to find a collision by
Q
. We now show that the many-to-many relationship by the conditional re-encoding prevents this cryptanalysis from finding such a collision. As pointed out previously, the attacker first tries to recover functions
S
_{0}
and
S
_{1}
using (4) as follows:
S
_{0}
(
α
) ⊕ 03 ⊗
S
_{1}
(0) = 02 ⊗
S
_{0}
(0) ⊕ 03 ⊗
S
_{1}
(
β
), because the non-linear encoding for the round output guarantees a one-to-one mapping regardless of how the mixing bijection obfuscates each byte.
However, as demonstrated in the previous analysis, even if both sides of the equation are the same, then their encoded outputs can be different owing to the conditional re-encoding. On the contrary, there can be collisions even when
α
,
β
) pairs satisfying
S
_{0}
. In turn,
S
_{1}
,
S
_{2}
, and
S
_{3}
cannot be recovered without recovering
S
_{0}
. Since the attacker is unable to find the correct collision sets, the strategy for the cryptanalysis is not complete for the proposed scheme.
A. Memory Requirement
The total size of the lookup tables is 1,212,416 bytes. A reasonable comparison is Chow and others’ WB-AES, which requires 770,048 bytes for the lookup tables. Only the Type II table increases its size, with a four-fold increase. This is due to the fact that there are four cases of input decoding in a Type II table, as stated previously. As a consequence, our proposed WB-AES requires 1.57-times the size of the lookup tables in total, compared to Chow and others’ WB-AES, shown in
Table 3
.

B. Execution Time
The number of lookups is left unchanged from Chow and others’ WB-AES, but some additional operations are needed to compute and read
γ
. To compare the runtime of the standard AES implementation, for both Chow and others’ WB-AES and our own, we performed experiments using a desktop PC with an Intel Core i7 CPU at 3.4 GHz. We repeated each algorithm 100 times and measured the number of elapsed CPU ticks during the execution. The average CPU tick is shown in
Table 4
. Because the number of clock ticks per second of our PC is 1,000, one CPU tick is equal to 1 ms.

Compared to AES, Chow and others’ WB-AES is approximately ten-times slower, whereas our WB-AES is twelve-times slower. The interesting point here is that we expected a much lower execution time because of a number of lookups compared to AES. This result indicates that other operations, such as an XOR, in AES probably take more time than the lookups.
C. Table Generation
The first step of white-box cryptography is to generate the lookup tables. However, generating the lookup tables in advance is not included in the runtime overhead in our case as previously mentioned. For a comparison, we provide the elapsed time for the table generation of the two WB-AES algorithms. The main reasons for the additional time, as compared to Chow and others’ WB-AES, are as follows:
As a result, our algorithm requires approximately 8,100 ticks, whereas that of Chow and others requires 5,200 ticks on average.
Algorithm 1 shows how to generate the original Type IV_III table in
[5]
. It first decides the proper decoding tables for
x
and
y
based on a pair. Then, it decodes all possible 4-bit inputs
x
and
y
, and computes the XOR value between them. To encode each resulting value, there is only one encoding table, E2. The encoding is simply done by looking up the values of E2 at the index
xy_xored
; the encoded values are stored in Type IV_III. This is simplified in
Fig. 7
.
Simplified flow chart of Algorithm 1. InvE1 and InvE2 are decoding tables, while E2 is an encoding table.
In Section IV, we recommended comparing a single bit of
x
and
y
to decide whether to conduct the re-encoding. For better understanding,
Fig. 8
simplifies the flow of the generation of Type IV_III if the re-encoding is decided by a single bit comparison between
x
and
y
. Algorithm 2 explains the generation of the proposed Type IV_III in detail on the condition that the re-encoding depends on the comparison of the LSB of
x
and the second LSB of
y
. For all possible pairs of 4-bit
x
and
y
, the proper decoding tables are decided first, and the XOR result is encoded by E2. Based on the result of the comparison, the re-encoding using an additional encoding table E3 is performed.
Simplified example of proposed encoding method; x (n ) is the n th bit of x in binary.
Algorithm 2 shows how to generate the proposed Type IV_III table; here, E3 is an additional encoding table used for the conditional re-encoding.
This work was supported by the K-SCARF project, the ICT R&D program of ETRI (Research on Key Leakage Analysis and Response Technologies).
1) The input decoding and the output encoding are performed in 4-bit unit with a concatenated form to avoid large tables.
skwang@etri.re.kr
Seungkwang Lee received his BS degree in computer science and electronic engineering from Handong University, Pohang, Rep. of Korea, in 2009 and his MS degree in computer science from Pohang University of Science and Technology, Rep. of Korea, in 2011. He is currently working as a researcher at ETRI. His research interests include cryptography, side-channel analysis, and fault injection attacks.
Corresponding Author dhchoi@ etri.re.kr
Dooho Choi received his BS degree in mathematics from Sungkyunkwan University, Suwon, Rep. of Korea, in 1994 and his MS and PhD degrees in mathematics from the Korea Advanced Institute of Science and Technology, Daejeon, Rep. of Korea, in 1996 and 2002, respectively. He has been a principal researcher at ETRI since January 2002, and he is also an assistant professor at the University of Science & Technology, Daejeon, Rep. of Korea. His current research interests include cryptographic engineering, including side-channel analysis and its countermeasure design; security technologies of RFID and IoT; and lightweight and secure cryptographic module HW/SW design. He was the editor of the ITU-T Rec. X.1171.
choiyj@etri.re.kr
Yong-Je Choi received his BS and MS degrees from Chonnam National University, Gwangju, Rep. of Korea, in 1996 and 1999, respectively. He is currently a senior member of technical staff at ETRI. His research interests include VLSI design, crypto processor design, side-channel analysis, and information security.

I. Introduction

Data encryption is, without a doubt, one of the best ways to protect information stored in a digital device. In most cases, cryptographic algorithms are open to all for review and the secrecy aspect is maintained through the use of secret keys. If an attacker needs to perform an impractically large number of mathematical tests to reveal a secret key, then it indicates that the cryptographic algorithms are computationally secure. In the case of the Advanced Encryption Standard (AES), there are 3.4 × 10
II. White-Box AES Implementation

In this section, we review the WB-AES algorithm proposed by Chow and others
[4]
with respect to a 128-bit key size, which can be extended to a 192- or 256-bit key size. In WB-AES, AddRoundKey, SubBytes, and part of the MixColumns are combined into a series of table lookups. To this end, the secret key is first integrated into an S-box. The MixColumns step is then combined to a key-customized S-box. For the details, we start with the description of a conventional AES-128 encryption
[11]
and its variation. AES-128 encryption can generally be described as follows:
- state ←plaintext
- AddRoundKey(state,k0)
- forr= 1 to 9
- SubBytes(state)
- ShiftRows(state)
- MixColumns(state)
- AddRoundKey(state,kr)
- SubBytes(state)
- ShiftRows(state)
- AddRoundKey(state,k10)
- ciphertext← state

- 1) It is possible to put AddRoundKey(state,k0) into the for-loop while removing AddRound(state,k9).
- 2) SubBytes followed by ShiftRows produces the same result as ShiftRows followed by SubBytes.
- 3) AddRoundKey(state,kr−1) followed by ShiftRows(state) produces the same result as ShiftRows(state) followed by AddRoundKey(state,k^r−1), wherek^r−1is the result of applying ShiftRows tokr−1.

- state ←plaintext
- forr= 1 to 9
- ShiftRows(state)
- AddRoundKey(state,k^r−1)
- SubBytes(state)
- MixColumns(state)
- ShiftRows(state)
- AddRoundKey(state,k^9)
- SubBytes(state)
- AddRoundKey(state,k10)
- ciphertext← state

- 1. AddRoundKey and SubBytes

For the key-customized instances of AES-128, AddRoundKey and SubBytes are first combined into the so-called T-boxes — a series of 160 lookup tables that map bytes to bytes. Theses T-boxes are defined as follows:
T i,j r (x)=S(x⊕ k ^ i,j r−1 ) for i = 0, ... , 3, j = 0, ... , 3, r = 1, ... , 9, T i,j 10 (x)=S(x⊕ k ^ i,j 9 )⊕ k i,j 10 for i = 0, ... , 3, j = 0, ... , 3.

Note that
k ^ 9

and
k ^ i,j r = k i,(j+i)mod 4 r .

- 2. T-Box and MixColumns

After each byte in the state is mapped through a T-box, multiplication of a 32-bit vector by a MixColumns matrix (MC) is performed. To reduce the total size of lookup tables, these multiplications are divided as four separate multiplications of an 8-bit vector by subdividing MC. Let
[ 02 03 01 01 01 02 03 01 01 01 02 03 03 01 01 02 ]×[ x 0 x 1 x 2 x 3 ] = x 0 ×[ 02 01 01 03 ] ⊕ x 1 ×[ 03 02 01 01 ] ⊕ x 2 ×[ 01 03 02 01 ] ⊕ x 3 ×[ 01 01 03 02 ] .

Let
T y 0 ( x ) =x× [ 02 01 01 03 ] T , T y 1 ( x ) =x× [ 03 02 01 01 ] T , T y 2 ( x ) =x× [ 01 03 02 01 ] T , T y 3 ( x ) =x× [ 01 01 03 02 ] T .

Then, multiplication of the four bytes
- 3. Encodings

Since an attacker can access the lookup tables from round 1, they can easily reveal the secret key. There is therefore a need for input and output encodings to obfuscate all lookup tables. There are two types of encodings — non-linear encodings
MB i −1 .

Another trapezoid that is labeled “32 × 8” in Type III depicted in
Fig. 1
means a combination of
MB i −1

and the inverse of the 8 × 8 mixing bijections
[5]
.
PPT Slide

Lager Image

- 4. XOR

The lookup values from Type II and Type III tables should be XOR-ed so that multiplications become complete. The Type IV tables define XOR operations that take in four bits from each of two previous computations and map them to their XOR; XOR(
- ■ E1(x0), ... , E1(x3): Encoded 4-bit inputs to Type IV table.
- ■ InvE1: Input decoding table.
- ■ E2: Output encoding table.
- ■ InvE2: Output decoding table, used in the next stage.

pair2 = E2(InvE1( E1( x 2 ) )⊕InvE1( E1( x 3 ) )), pair1 = E2(InvE1( E1( x 1 ) )⊕InvE2( pair2 )), pair0 = E2(InvE1( E1( x 0 ) )⊕InvE2( pair1 )).

Here, pair0 has a value of E2(
- 5. External Encodings

External input and output encodings are composed of two sets of sixteen 8-bit to 128-bit lookup tables. Each external input encoding table (Type IA) contains one 128 × 8 vertical section of a 128 × 128 matrix (that is, the composition of the inverse of the initial encoding prior to round 1 and the concatenation of the input mixing bijections for the inverse of
T i,j 1

) surrounded by 4-bit input and output encodings. Each external output encoding table (Type IB), on the other hand, includes the composition of the mixing bijection combined with
T i,j 10

, and a 128 × 8 vertical section of a 128 × 128 matrix for the final encoding, surrounded by 4-bit input and output encodings. These result in the lookup tables depicted in
Fig. 2
. Consequently, all of the tables described thus far are looked up in the order shown in
Fig. 3
.
PPT Slide

Lager Image

PPT Slide

Lager Image

III. Cryptanalysis of WB-AES

In this section, we briefly introduce three methods of cryptanalysis of WB-AES and point out a common starting point between them. As explained in the previous section, Chow and others’ WB-AES obfuscates the lookup tables using input/output encodings and mixing bijections. However, linear mixing bijections along with output encodings and the corresponding input decodings along with linear mixing bijections cancel out at the boundary between two consecutive rounds. Billet and others’ cryptanalysis
[6]
starts with this fact. We first review Billet and others’ cryptanalysis and then Michiels and others’ generalized cryptanalysis of white-box ciphers
[15]
. In addition, we review a collision-based attack introduced by Lepoint and others
[10]
. Our purpose is to find a common starting point of cryptanalysis, and we thus propose a countermeasure in which the starting point is not applied.
- 1. Billet and Others’ Cryptanalysis

In
[6]
, the cryptanalysis begins with the conceptualized
R j r

boxes depicted in
Fig. 4
. Each
R j r

box consists of four 8-bit to 8-bit input permutations
P i,j r

(and the respective output permutations
Q i,j r

), made up of the composition of two concatenated 4-bit to 4-bit input (and the respective output) encodings, and one 8-bit to 8-bit mixing bijection. Each
Q i,j r

is the inverse of
P i,j r+1 ,

where
R j r

representation, the cryptanalysis defines a function
PPT Slide

Lager Image

c 3 ′

, produces the following function as a lookup table:
Q ˜ i,j r

of
Q i,j r

such that
Q ˜ i,j r −1 ∘ Q i,j r

is an affine mapping,
A i,j r −1 ,

for any round
Q ˜

, such that there exists an affine mapping,
Q ˜

=
Q 0 r

=
Q i r

of a round can be computed. At the same time,
P i r+1

is computed because we know
Q i r

is the inverse of
P i r+1

, as stated previously. After the mappings are recovered, the round key bytes embedded in the AES-128 white-box implementation can be retrieved. Even though they are not necessarily in the right order, they can be rearranged owing to the constraint in the key derivation algorithm of AES-128. In conclusion, the total complexity of this cryptanalysis is bounded by 2
- 2. Michiels and Others’ Cryptanalysis

Michiels and others generalized Billet and others’ strategy against substitution-linear transformation (SLT) ciphers
[15]
. Their generalization is composed of the following three steps:
- 1) Removing the non-linear part of the mixing bijection encodings using Theorem 1
- 2) Removing the linear part of the encodings by solving a linear equivalence problem[17]–[18]
- 3) Extracting the secret key information by solving a matrix equivalence problem

T i r

, followed by an invertible affine function
T i r

and
- 3. Lepoint and Others’ Cryptanalysis

A collision-based attack
[10]
was recently introduced in SAC2013. As shown in
Fig. 5
, this cryptanalysis finds collisions to recover functions
PPT Slide

Lager Image

IV. Proposed Encoding Method

The key idea behind the conditional re-encoding we propose is to create many-to-many relationships in the non-linear encoding for the round output. To this end, the non-linear encoding table for the round output is switched depending on the characteristic of the inputs to Type IV_III (XOR tables). We also analyze its security against cryptanalysis in the next section.
- 1. Basic Idea and Its Extension

- 1) Encodezusing table E2:z′ ← E2(z).
- 2) Letϕbe a function examining the characteristics ofxandybased on a predefined rule. For example, for two randomly selectedi,j, where 1 ≤i,j≤ 4, andi≠j, we have

ϕ(x,y)={ 1 ifx(i)≠y(j), 0 otherwise,

- wherex(n) indicates thenth bit ofx.
- 3) Ifϕ(x,y) = 1, then re-encodez′ using an additional encoding table E3:z′← E3(z′).
- 4) Returnδ(x,y,z) =z′. Here, the choice ofϕis not limited to this example, but can be extended to various scenarios including a comparison of multiple bits ofxandy. We will discuss the security aspect of this later.

z= δ −1 ( z ′ ,γ)={ InvE2( InvE3( z ′ ) ) if γ=1, InvE2( z ′ ) otherwise.

The last step is to apply the conditional re-encoding and decoding during the generation of the lookup table. The table generation for Type IV_III and Type II tables needs to be modified as follows.
Four cases of input decoding in a Type II table, whereγis corresponding value ofϕfor each case.

InvE2( | InvE2( | 00 |

InvE2( | InvE3(InvE2( | 01 |

InvE3(InvE2( | InvE2( | 10 |

InvE3(InvE2( | InvE3(InvE2( | 11 |

- 2. Security Analysis

As stated previously, three methods of cryptanalysis depend on Theorem 1 or the collision equation. In turn, Theorem 1 and the collision equation strongly rely on
{ E3 °E2if ϕ(x,y)=1, E2otherwise,

where
Notations used.

Notation | Usage |
---|---|

E2_high | Non-linear encoding table for the upper 4 bits |

E2_low | For the lower 4 bits |

E3_high | Re-encoding for the upper 4 bits |

E3_low | For the lower 4 bits |

_{0} | The lower 4-bit of an 8-bit |

_{1} | The upper 4-bit of |

Non-linear encoding function | |

Decision function for re-encoding | |

Satellite variable storing the occurrences of re-encodings for an 8-bit value |

Z1=X⊕Y=0x25 and γ= 01.

This gives us the final encoded output of 0xAC because the lower 4-bit of
PPT Slide

Lager Image

Z2=X⊕Y=0x25 and γ=10.

In this case, the final encoded output is 0x71 because only the upper 4-bit of
Z3=X⊕Y=0x2D and γ= 00.

The final encoded output is then 0xAC because there is no re-encoding. In the first two examples, we showed that the two encoded outputs can be different due to the re-encoding even though
Q ˜

in Theorem 1 and therefore the cryptanalysis cannot work.
Without loss of generality, if we compare
Q 0 (02⊗ S 0 (α)⊕03⊗ S 1 ( 0 )⊕c) = Q 0 (02⊗ S 0 ( 0 )⊕03⊗ S 1 (β)⊕c).

In this cryptanalysis, this means that 02 ⊗
02⊗ S 0 (α)⊕03⊗ S 1 ( 0 )≠02⊗ S 0 ( 0 )⊕03⊗ S 1 (β).

We expect that each intermediate bit becomes 0 or 1 with a 1/2 probability owing to the confusion and diffusion property of S-box, MixColumns, and mixing bijection. In this assumption, half of the (
02⊗ S 0 (α)⊕03⊗ S 1 ( 0 )=02⊗ S 0 ( 0 )⊕03⊗ S 1 (β)

may not lead to collisions if the re-encoding occurs by a 1-bit comparison. The attacker’s disadvantage here is that it is not allowed to fully recover
- 3. Size and Performance

We provide additional costs of our proposed algorithm in terms of the memory requirement and execution speed. The only operation left unchanged from the AES implementation of Daemen and Rijmen
[11]
is ShiftRows while the changed lookup tables from Chow and others’ WB-AES are Type II and IV_III tables. The elapsed time for the table generation is not included in the runtime overhead.
Comparison of table size: dash symbol (−) indicates that proposed algorithm is same size as Chow and others’ algorithm, in unit of bytes.

Type | Chow’s | Proposed |
---|---|---|

Type IA | 65,536 | - |

Type IA_IV | 61,440 | - |

Type II | 147,456 | 589,824 |

Type IV_II | 110,592 | - |

Type III | 147,456 | - |

Type IV_III | 110,592 | - |

Type IB | 65,536 | - |

Type IV_IB | 61,440 | - |

Total | 770,048 | 1,212,416 |

Comparison of runtime. Number of clock ticks per second is 1,000.

Algorithm | CPU ticks |
---|---|

AES-128 | 1 |

Chow and others | 10 |

Our WB-AES | 12 |

- ■ generating additional encoding tables, E3 and InvE3
- ■ conditional re-encoding of the output of Type IV_III tables
- ■ generating the new Type II tables, which are four-times bigger than the original type

V. Conclusion

To provide a secure WB-AES implementation, we proposed a conditional re-encoding method applied to the round outputs, and changed the corresponding input decoding of the next round. In addition, we described how to properly look up the tables and showed that the encoding variance protects against cryptanalysis. Additional costs compared to Chow and others’ WB-AES are about 1.57-times the total size of the lookup tables and 1.2-times the execution speed.
Appendix

PPT Slide

Lager Image

PPT Slide

Lager Image

BIO

Kocher P.
,
Jaffe J.
,
Jun B.
“Differential Power Analysis,”
Int. Cryptology Conf. Adv. Cryptology
Santa Barbara, CA, USA
Aug. 15–19, 1999
388 -
397

Brier E.
,
Clavier C.
,
Olivier F.
“Correlation Power Analysis with a Leakage Model,”
Cryptographic Hardware Embedded Syst.
Cambridge, MA, USA
Aug. 11–13, 2004
16 -
29

Lee S.
,
Choi D.
,
Choi Y.
2014
“Improved Shamir’s CRT-RSA Algorithm: Revisit with the Modulus Chaining Method,”
ETRI J.
36
(3)
469 -
478
** DOI : 10.4218/etrij.14.0113.0317**

Boneh D.
,
DeMillo R.A.
,
Lipton R.J.
“On the Importance of Checking Cryptographic Protocols for Faults,”
Int. Conf. Theory Appl. Cryptographic Techn.
Konstanz, Germany
May 11–15, 1997
37 -
51

Chow S.
“White-Box Cryptography and an AES Implementation,”
Workshop Sel. Areas Cryptography
Madrid, Spain
Aug. 15–16, 2002
250 -
270

Billet O.
,
Gilbert H.
,
Ech-Chatbi C.
“Cryptanalysis of a White Box AES implementation,”
Int. Conf. Sel. Areas Cryptography
Waterloo, Canada
Aug. 9–10, 2004
227 -
240

Bringer J.
,
Chabanne H.
,
Dottax E.
2006
“White Box Cryptography: Another Attempt,”
IACR Cryptology ePrint Archive
2006
468 -

Mulder Y.D.
,
Roelse P.
,
Preneel B.
“Cryptanalysis of the Xiao-Lai White-Box AES Implementation,”
Int. Conf. Sel. Areas Cryptography
Windsor, Canada
Aug. 15–16, 2012
34 -
49

Mulder Y.D.
,
Wyseur B.
,
Preneel B.
“Cryptanalysis of a Perturbated White-Box AES Implementation,”
Int. Conf. Cryptology India
Hyderabad, India
Dec. 12–15, 2010
292 -
310

Lepoint T.
“Two Attacks on a White-Box AES Implementation,”
Int. Workshop Sel. Areas Cryptography
Burnaby, Canada
Aug. 14–16, 2013
265 -
285

Daemen J.
,
Rijmen V.
1998
AES Proposal: Rijndael
http://csrc.nist.gov/archive/aes/rijndael/Rijndael-ammended.pdf

Muir J.A.
2013
“A Tutorial on White-Box AES,”
IACR Cryptology ePrint Archive
104 -

Shannon C. E.
1949
“Communication Theory of Secrecy Systems,”
Bell Syst. Techn. J.
28
(4)
656 -
715
** DOI : 10.1002/j.1538-7305.1949.tb00928.x**

Saremi Jeff
White-Box AES Project for Educational Purposes
https://github.com/wbaes

Michiels W.
,
Gorissen P.
,
Hollmann H.D.
“Cryptanalysis of a Generic Class of White-Box Implementations,”
Int. Conf. Sel. Areas Cryptography
Sackville, Canada
Aug. 14–15, 2008
414 -
428

Tolhuizen L.
“Improved Cryptanalysis of an AES Implementation,”
WIC Symp. Inf. Theory Benelux
Boekelo, Netherlands
May 24–25, 2012

Biryukov A.
“A Toolbox for Cryptanalysis: Linear and Affine Equivalence Algorithms,”
Int. Conf. Theory Appl. Cryptographic Techn.
Warsaw, Poland
May 4–8, 2003
33 -
50

Fuller J.
,
Millan W.
“Linear Redundancy in S-Boxes,”
Int. Workshop Fast Softw. Encryption
Lund, Sweden
Feb. 24–26, 2003
74 -
86

Citing 'Conditional Re-encoding Method for Cryptanalysis-Resistant White-Box AES
'

@article{ HJTODO_2015_v37n5_1012}
,title={Conditional Re-encoding Method for Cryptanalysis-Resistant White-Box AES}
,volume={5}
, url={http://dx.doi.org/10.4218/etrij.15.0114.0025}, DOI={10.4218/etrij.15.0114.0025}
, number= {5}
, journal={ETRI Journal}
, publisher={Electronics and Telecommunications Research Institute}
, author={Lee, Seungkwang
and
Choi, Dooho
and
Choi, Yong-Je}
, year={2015}
, month={Oct}