Conditional Re-encoding Method for Cryptanalysis-Resistant White-Box AES
Conditional Re-encoding Method for Cryptanalysis-Resistant White-Box AES
ETRI Journal. 2015. Oct, 37(5): 1012-1022
Copyright © 2015, Electronics and Telecommunications Research Institute (ETRI)
  • Received : January 27, 2014
  • Accepted : August 07, 2015
  • Published : October 01, 2015
Export by style
Cited by
About the Authors
Seungkwang Lee
Dooho Choi
Yong-Je Choi

Conventional cryptographic algorithms are not sufficient to protect secret keys and data in white-box environments, where an attacker has full visibility and control over an executing software code. For this reason, cryptographic algorithms have been redesigned to be resistant to white-box attacks. The first white-box AES (WB-AES) implementation was thought to provide reliable security in that all brute force attacks are infeasible even in white-box environments; however, this proved not to be the case. In particular, Billet and others presented a cryptanalysis of WB-AES with 230 time complexity, and Michiels and others generalized it for all substitution-linear transformation ciphers. Recently, a collision-based cryptanalysis was also reported. In this paper, we revisit Chow and others’ first WB-AES implementation and present a conditional re-encoding method for cryptanalysis protection. The experimental results show that there is approximately a 57% increase in the memory requirement and a 20% increase in execution speed.
I. Introduction
Data encryption is, without a doubt, one of the best ways to protect information stored in a digital device. In most cases, cryptographic algorithms are open to all for review and the secrecy aspect is maintained through the use of secret keys. If an attacker needs to perform an impractically large number of mathematical tests to reveal a secret key, then it indicates that the cryptographic algorithms are computationally secure. In the case of the Advanced Encryption Standard (AES), there are 3.4 × 10 38 possible key combinations with respect to a 128-bit key size. Cracking the key using a supercomputer would require an average of 1.02 × 10 18 years of testing at a rate of 10.51 × 10 15 floating point operations per second. For this reason, if the secret key is kept secure, then it is also assumed to be secure.
Most cryptographic algorithms are designed to protect a secret key against what is known as a “black-box attack.” In a black-box attack model, an attacker has knowledge of the cryptographic algorithm and is able to examine various inputs and outputs, but is unable to monitor its execution or any intermediate results generated during the computation. Based on these conditions, a black-box attacker can mount various types of attacks including known-plaintext, ciphertext-only, chosen-ciphertext, and adaptive chosen-plaintext attacks.
While these attacks seem to be powerful enough in a black-box attack model, the physical properties of cryptosystems mean software is vulnerable to advanced attacks. Often, if an attacker has sufficient access to a target cipher to mount adaptive attacks, then they may obtain some useful information about the execution of an algorithm and its secret key. For example, side-channel attacks, such as differential power analysis (DPA) [1] or correlation power analysis (CPA) [2] , enable an attacker to crack smart cards protected with cryptographic algorithms in a matter of minutes rather than the theoretical billion years of testing. This type of attack exploits the fact that the power consumption of a cryptographic device at any given point in time strongly depends on the data it processes and the operation it performs. Provided that enough power traces are given, an attacker can perform a statistical analysis and discover a secret key very quickly [3] . Another type of side-channel attack is realized through maliciously injecting faults into a cryptographic device and observing the corresponding erroneous outputs; this is a so-called fault injection attack. In some extreme cases such as the RSA algorithm using the Chinese Remainder Theorem (CRT-RSA), even a single fault injection enables an attacker to reveal a secret key [3] [4] .
Side-channel attacks, however, are characterized as a “grey-box attack” because an attacker has access to a small part of the execution. From this standpoint, the countermeasures against a grey-box attack do not fully reflect reality. A much more powerful type of attack is a “white-box attack.” In this attack model, an attacker has total visibility into an execution platform and the software implementation of an algorithm. This means that an attacker can analyze and tamper with the binary code and corresponding memory page of the application during execution. To this end, an attacker may use an assistant attack tool such as a disassembler or debugger. Unfortunately, most cryptographic algorithms, including those for digital signatures, authentication, smart cards, and the like, are vulnerable to white-box attacks. There is therefore a need for white-box cryptography in any software protection strategy. The protection of AES against a white-box attack (WB-AES) was started in [5] with the addition of cryptanalysis for this technique [6] , and continuous research on WB-AES has followed [7] [10] .
Overall, each WB-AES algorithm has a corresponding cryptanalysis of its weakness. There is therefore a need for a secure WB-AES algorithm that is resistant to cryptanalysis.
In this paper, we revisit the vulnerabilities of Chow’s WB-AES [5] and improve it in such a way so as to protect against a white-box attack, with low additional costs. Our solution changes a portion of the encoding method in addition to the construction of a specific type of lookup table. This new encoding method is the key idea behind the protection of previous white-box attacks. Compared to Chow’s WB-AES, the additional costs are 1.57-times the size of the lookup tables, and 1.2-times the runtime with an Intel Core i7 at 3.4 GHz.
The rest of this paper is organized as follows. Section II briefly reviews the WB-AES algorithm proposed by Chow and others. In Section III, we describe the previous cryptanalysis. We then propose our WB-AES algorithm, which is resistant to cryptanalysis, and analyze its security and performance in Section IV. Finally, we provide some concluding remarks in Section V.
II. White-Box AES Implementation
In this section, we review the WB-AES algorithm proposed by Chow and others [4] with respect to a 128-bit key size, which can be extended to a 192- or 256-bit key size. In WB-AES, AddRoundKey, SubBytes, and part of the MixColumns are combined into a series of table lookups. To this end, the secret key is first integrated into an S-box. The MixColumns step is then combined to a key-customized S-box. For the details, we start with the description of a conventional AES-128 encryption [11] and its variation. AES-128 encryption can generally be described as follows:
  •    state ←plaintext
  •    AddRoundKey(state,k0)
  •    forr= 1 to 9
  •              SubBytes(state)
  •              ShiftRows(state)
  •              MixColumns(state)
  •              AddRoundKey(state,kr)
  • SubBytes(state)
  • ShiftRows(state)
  • AddRoundKey(state,k10)
  • ciphertext← state
Here, we know the following [5] , [12] :
  • 1) It is possible to put AddRoundKey(state,k0) into the for-loop while removing AddRound(state,k9).
  • 2) SubBytes followed by ShiftRows produces the same result as ShiftRows followed by SubBytes.
  • 3) AddRoundKey(state,kr−1) followed by ShiftRows(state) produces the same result as ShiftRows(state) followed by AddRoundKey(state,k^r−1), wherek^r−1is the result of applying ShiftRows tokr−1.
These observations give us the following description by which AES is converted into a table-based implementation.
  • state ←plaintext
  • forr= 1 to 9
  •           ShiftRows(state)
  •           AddRoundKey(state,k^r−1)
  •           SubBytes(state)
  •           MixColumns(state)
  • ShiftRows(state)
  • AddRoundKey(state,k^9)
  • SubBytes(state)
  • AddRoundKey(state,k10)
  • ciphertext← state
- 1. AddRoundKey and SubBytes
For the key-customized instances of AES-128, AddRoundKey and SubBytes are first combined into the so-called T-boxes — a series of 160 lookup tables that map bytes to bytes. Theses T-boxes are defined as follows:
T i,j r (x)=S(x⊕ k ^ i,j r−1 ) for i = 0, ... , 3, j = 0, ... , 3, r = 1, ... , 9, T i,j 10 (x)=S(x⊕ k ^ i,j 9 )⊕ k i,j 10  for i = 0, ... , 3, j = 0, ... , 3.
Note that S ( x ) is an S-box with an 8-bit input value, x , and T-boxes for round 10 have to absorb two round keys,
k ^ 9
and k 10 , where
k ^ i,j r = k i,(j+i)mod 4 r .
- 2. T-Box and MixColumns
After each byte in the state is mapped through a T-box, multiplication of a 32-bit vector by a MixColumns matrix (MC) is performed. To reduce the total size of lookup tables, these multiplications are divided as four separate multiplications of an 8-bit vector by subdividing MC. Let x 0 , x 1 , x 2 , and x 3 be four bytes to be multiplied by MC. The multiplication can be decomposed as follows:
[ 02 03 01 01 01 02 03 01 01 01 02 03 03 01 01 02 ]×[ x 0 x 1 x 2 x 3 ]                      = x 0 ×[ 02 01 01 03 ] ⊕ x 1 ×[ 03 02 01 01 ] ⊕ x 2 ×[ 01 03 02 01 ] ⊕ x 3 ×[ 01 01 03 02 ] .
Let y 0 , y 1 , y 2 , and y 3 denote the four terms on the right-hand side. The so-called Tyi tables that map 8-bits to 32-bits are defined as follows:
T y 0 ( x ) =x× [ 02 01 01 03 ] T , T y 1 ( x ) =x× [ 03 02 01 01 ] T , T y 2 ( x ) =x× [ 01 03 02 01 ] T , T y 3 ( x ) =x× [ 01 01 03 02 ] T .
Then, multiplication of the four bytes x 0 , x 1 , x 2 , and x 3 by MC can be computed by four table lookups and three XORs: Ty 0 ( x ) ⊕ Ty 1 ( x ) ⊕ Ty 2 ( x ) ⊕ Ty 3 ( x ). In rounds 1 to 9, 144 Tyi tables are needed to accept the 8-bit outputs of T-boxes.
- 3. Encodings
Since an attacker can access the lookup tables from round 1, they can easily reveal the secret key. There is therefore a need for input and output encodings to obfuscate all lookup tables. There are two types of encodings — non-linear encodings 1) on its input and output, and affine mixing bijections applied to all key-dependent lookup tables. The input and output encodings achieve confusion , as defined by Shannon [13] , while mixing bijections achieve diffusion . In the Type II tables, 8 × 8 mixing bijections are used to diffuse the input to T-boxes, and 32 × 32 mixing bijections, MB, are inserted after MixColumns. A trapezoid that is labeled “32 × 8” in Type II shown in Fig. 1 indicates a result of applying the mixing bijections to the Tyi tables. The Type III tables combine MB −1 with the inverse of the input 8 × 8 mixing bijections for T-boxes of the next round. To reduce the size of lookup tables, MB −1 is also divided into four submatrices,
MB i −1 .
Another trapezoid that is labeled “32 × 8” in Type III depicted in Fig. 1 means a combination of
MB i −1
and the inverse of the 8 × 8 mixing bijections [5] .
PPT Slide
Lager Image
Type II, III, and IV tables (from left).
- 4. XOR
The lookup values from Type II and Type III tables should be XOR-ed so that multiplications become complete. The Type IV tables define XOR operations that take in four bits from each of two previous computations and map them to their XOR; XOR( x , y ) = x y .
There are two variations of Type IV in WB-AES — one to XOR (and combine) 4-bit chunks for a 32-bit result of Type II and Type III tables, and the other for a 128-bit result of Type IA and Type IB tables. Each 4-bit input has to be decoded and the 4-bit output has to be encoded non-linearly. The XOR operation of 4-bit chunks using Type IV tables is given below for illustration purposes [14] :
  • ■ E1(x0), ... , E1(x3): Encoded 4-bit inputs to Type IV table.
  • ■ InvE1: Input decoding table.
  • ■ E2: Output encoding table.
  • ■ InvE2: Output decoding table, used in the next stage.
The resulting value of E2( x 0 x 1 x 2 x 3 ) can be computed in the following way:
pair2 = E2(InvE1( E1( x 2 ) )⊕InvE1( E1( x 3 ) )), pair1 = E2(InvE1( E1( x 1 ) )⊕InvE2( pair2 )), pair0 = E2(InvE1( E1( x 0 ) )⊕InvE2( pair1 )).
Here, pair0 has a value of E2( x 0 x 1 x 2 x 3 ). Type II, III, and IV tables are depicted in Fig. 1 .
- 5. External Encodings
External input and output encodings are composed of two sets of sixteen 8-bit to 128-bit lookup tables. Each external input encoding table (Type IA) contains one 128 × 8 vertical section of a 128 × 128 matrix (that is, the composition of the inverse of the initial encoding prior to round 1 and the concatenation of the input mixing bijections for the inverse of
T i,j 1
) surrounded by 4-bit input and output encodings. Each external output encoding table (Type IB), on the other hand, includes the composition of the mixing bijection combined with
T i,j 10
, and a 128 × 8 vertical section of a 128 × 128 matrix for the final encoding, surrounded by 4-bit input and output encodings. These result in the lookup tables depicted in Fig. 2 . Consequently, all of the tables described thus far are looked up in the order shown in Fig. 3 .
PPT Slide
Lager Image
Type IA and IB tables.
PPT Slide
Lager Image
Sequence of table lookups.
For later use, we distinguish Type IV following Type II and Type III tables as Type IV_II and Type IV_III tables, respectively. For compatibility with an original AES, the input to Type IA should be first encoded by the corresponding linear and non-linear transformations. On the other hand, the output of round 10 should be decoded by the corresponding non-linear and linear transformations.
All encoding tables have to be deleted after generating the lookup tables. All we need is a set of lookup tables for encryption/decryption.
III. Cryptanalysis of WB-AES
In this section, we briefly introduce three methods of cryptanalysis of WB-AES and point out a common starting point between them. As explained in the previous section, Chow and others’ WB-AES obfuscates the lookup tables using input/output encodings and mixing bijections. However, linear mixing bijections along with output encodings and the corresponding input decodings along with linear mixing bijections cancel out at the boundary between two consecutive rounds. Billet and others’ cryptanalysis [6] starts with this fact. We first review Billet and others’ cryptanalysis and then Michiels and others’ generalized cryptanalysis of white-box ciphers [15] . In addition, we review a collision-based attack introduced by Lepoint and others [10] . Our purpose is to find a common starting point of cryptanalysis, and we thus propose a countermeasure in which the starting point is not applied.
- 1. Billet and Others’ Cryptanalysis
In [6] , the cryptanalysis begins with the conceptualized
R j r
boxes depicted in Fig. 4 . Each
R j r
box consists of four 8-bit to 8-bit input permutations
P i,j r
(and the respective output permutations
Q i,j r
), made up of the composition of two concatenated 4-bit to 4-bit input (and the respective output) encodings, and one 8-bit to 8-bit mixing bijection. Each
Q i,j r
is the inverse of
P i,j r+1 ,
where r ∈ [0, 9]. Using this
R j r
representation, the cryptanalysis defines a function yi of ( x 0 , x 1 , x 2 , x 3 ) as follows:
y i ( x 0 , x 1 , x 2 , x 3 )= Q i r ( α i,0 T 0 r ( P 0 r ( x 0 ) ) α i,1 T 1 r ( P 1 r ( x 1 ) ) α i,2 T 2 r ( P 2 r ( x 2 ) ) α i,3 T 3 r ( P 3 r ( x 3 ) ) ),
PPT Slide
Lager Image
One of four R j r mappings, j = 0, … , 3 [6].
where αi,j are the coefficients of MixColumns. If x 1 , x 2 , and x 3 are fixed to certain constants, say c 1 , c 2 , and c 3 , respectively, then we have
y 0 (x, c 1 , c 2 , c 3 )= Q 0,j r ( α T 0,j r ( P 0,j r (x) ) β c ).
Since x only takes 256 values and constant β c is fixed, these mappings are known by the input/output, as well as by their inverses. Changing one of the constants, c 1 , c 2 , or c 3 , for example, c 1 , c 2 , and
c 3 ′
, produces the following function as a lookup table:
y 0 (x, c 1 , c 2 , c 3 ) y 0 ( x, c 1 , c 2 , c 3 ) 1              = Q 0,j r ( α T 0,j r ( P 0,j r ( ​​ P 0,j r 1 ( T 0,j r 1 ( α 1 ( Q 0,j r 1 ​ ​ β c′ ) ) ) ) ) β c )              = Q 0,j r ( ( Q 0,j r 1 β c′ ) β c ),              = Q 0,j r ( Q 0,j r 1 β),
where β = β c β c ′ takes all values in GF(2 8 ). Theorem 1 (below) then enables an attacker to recover the non-linear part
Q ˜ i,j r
Q i,j r  
such that
Q ˜ i,j r −1 ∘ Q i,j r
is an affine mapping,
A i,j r −1 ,
for any round r ∈ [1, 9] with time complexity 2 24 . Because there is no MixColumns step, the last round is excluded from the cryptanalysis. The open dot “ ◦ ” means a composition of functions.
Theorem 1. Given a set of functions S = { Q ◦ ⊕ β Q −1 } β , where β ∈ GF(2 8 ), Q is a permutation of GF(2 8 ), and ⊕ β is the translation by β in GF(2 8 ), one can construct a particular solution,
Q ˜
, such that there exists an affine mapping, A , so that
Q ˜
= Q A .
The next step is to recover the affine maps
Q 0 r
= A 0 q 0 , where A 0 is linear and q 0 is a constant. To this end, (1) is used to determine a unique mapping, L , and a constant, c , such that for all x 0 ⊕ GF(2 8 ) we have yi ( x 0 , 0, 0, 0 ) = L ( yj ( x 0 , 0, 0, 0)) ⊕ c .
The time complexity to solve this is much lower than 2 16 with a highly over-defined linear system of 2 8 × 8 equations, involving the 64 entries of L as well as the eight entries of c as unknowns over GF(2), using the knowledge of the functions yi and yj based on their values.
It is then possible to recover A 0 , the linear part of Q 0 , with time complexity 2 24 by determining the characteristic polynomial of L . At the same time, the constant part q 0 of Q 0 is also computed using the knowledge of αi,j by setting the four variables in (1) to zero. It was shown that the linear parts of Q 1 , Q 2 , and Q 3 can be directly computed from the knowledge of the linear part of Q 0 with time complexity 2 16 . In a similar way, all
Q i r
of a round can be computed. At the same time,
P i r+1
is computed because we know
Q i r
is the inverse of
P i r+1
, as stated previously. After the mappings are recovered, the round key bytes embedded in the AES-128 white-box implementation can be retrieved. Even though they are not necessarily in the right order, they can be rearranged owing to the constraint in the key derivation algorithm of AES-128. In conclusion, the total complexity of this cryptanalysis is bounded by 2 30 . In [16] , Tolhuizen improved it to 2 22 using a preprocessing step.
- 2. Michiels and Others’ Cryptanalysis
Michiels and others generalized Billet and others’ strategy against substitution-linear transformation (SLT) ciphers [15] . Their generalization is composed of the following three steps:
  • 1) Removing the non-linear part of the mixing bijection encodings using Theorem 1
  • 2) Removing the linear part of the encodings by solving a linear equivalence problem[17]–[18]
  • 3) Extracting the secret key information by solving a matrix equivalence problem
To be more specific, a generic substitution-affine transformation (SAT) cipher is defined in the second step, where a round consists of T-boxes
T i r
, followed by an invertible affine function ϵr . Because MixColumns, the affine function of AES, are known to an attacker,
T i r
and ϵr can be computed. Biryukov and others’ algorithm [17] is used to solve the equivalence problem Ti = γi Si δi , where γ and δ are the affine functions that describe the affine relation between Ti and Si . Since Ti is a part of the SAT cipher, it contains the key kr . Note that this cryptanalysis also begins with Theorem 1 to remove the non-linear part.
- 3. Lepoint and Others’ Cryptanalysis
A collision-based attack [10] was recently introduced in SAC2013. As shown in Fig. 5 , this cryptanalysis finds collisions to recover functions S 0 , S 1 , S 2 , and S 3 and associated key bytes using the equation
Q 0 (02 S 0 (α)03 S 1 ( 0 )c) = Q 0 (02 S 0 ( 0 )03 S 1 (β)c).
As they mentioned, there are 256 pairs ( α , β ) with a trivial solution (0, 0). The attacker constructs a linear system to recover S 0 and S 1 . After S 0 is recovered, S 1 can be recovered through an exhaustive search. The remaining functions, S 2 and S 3 , are then recovered similarly by solving the linear systems. Once the Si functions have been recovered, one can easily recover Qi in the output of the first round. The total time complexity of the attack is 2 22 .
PPT Slide
Lager Image
Lepoint and others’ attack [10].
For the cryptanalysis above, we can see through Theorem 1 that the first two methods of cryptanalysis used and the collision equation of the last method are both dependent on Q . More precisely, their assertion assumes that for x , y ∈ GF(2 8 ), if x = y , then Q ( x ) = Q ( y ); and if x y , then Q ( x ) ≠ Q ( y ).
What is important here is that intermediate values are combined by XORs, and the non-linear encoding is then followed in the 4-bit unit. In the following section, based on this fact, we propose a conditional re-encoding method and show that it can protect against three methods of cryptanalysis.
IV. Proposed Encoding Method
The key idea behind the conditional re-encoding we propose is to create many-to-many relationships in the non-linear encoding for the round output. To this end, the non-linear encoding table for the round output is switched depending on the characteristic of the inputs to Type IV_III (XOR tables). We also analyze its security against cryptanalysis in the next section.
- 1. Basic Idea and Its Extension
A. Round Output Encoding
Let δ denote the non-linear encoding function for the round output. Given x , y ∈ {0, 1} 4 and z = x y , δ depends on x , y , and z instead of only z . This is as follows:
  • 1) Encodezusing table E2:z′ ← E2(z).
  • 2) Letϕbe a function examining the characteristics ofxandybased on a predefined rule. For example, for two randomly selectedi,j, where 1 ≤i,j≤ 4, andi≠j, we have
ϕ(x,y)={ 1      ifx(i)≠y(j), 0      otherwise,
  • wherex(n) indicates thenth bit ofx.
  • 3) Ifϕ(x,y) = 1, then re-encodez′ using an additional encoding table E3:z′← E3(z′).
  • 4) Returnδ(x,y,z) =z′. Here, the choice ofϕis not limited to this example, but can be extended to various scenarios including a comparison of multiple bits ofxandy. We will discuss the security aspect of this later.
This conditional re-encoding depending on the value of ϕ creates a many-to-many relationship between z and z ′. To be more precise, given x 1, x 2, y 1, y 2 ∈ {0, 1} 4 such that x 1 ⊕ y 1 = z 1 and x 2 ⊕ y 2 = z 2, let us assume that ϕ ( x 1, y 1) = 1 and ϕ ( x 2, y 2) = 0. We then know that δ ( x 1, y 1, z 1) ≠ δ ( x 2, y 2, z 2). For this reason, z ′ requires the value of ϕ ( x , y ) to be correctly decoded. Because the value of ϕ ( x , y ) determines whether a re-encoding takes place, δ is a function of ϕ ( x , y ) and z .
B. Round Input Decoding
To properly decode z ′ back to z , we need to keep track of the occurrence of the re-encoding in a satellite value. We denote the satellite variable by γ , and its value is determined by the value of ϕ ( x , y ). If γ = 1, then it means that z ′ = E3(E2( z )); otherwise, z ′ = E2( z ). The input decoding δ −1 of the next round is then a function of z ′ and γ as follows:
z= δ −1 ( z ′ ,γ)={ InvE2( InvE3( z ′ ) )        if γ=1, InvE2( z ′ )                            otherwise.
The last step is to apply the conditional re-encoding and decoding during the generation of the lookup table. The table generation for Type IV_III and Type II tables needs to be modified as follows.
C. Type IV_III Generation and Lookup
A sub-byte of a round output is computed using three XOR operations combining four intermediate values, and we denote each intermediate result by pair2, pair1, and pair0, as in Section II. Without having to apply the conditional re-encoding to all pairs, we apply it only when pair0 is computed for simplicity. Thus, a Type IV_III table should reflect the conditional re-encoding based on the characteristics of the two 4-bit inputs of pair0 when it is generated (see Appendix).
To properly look up a Type IV_III table, we must consider that all encodings and XOR operations are performed in the 4-bit unit, and γ needs to be two bits to track the re-encodings of the upper and lower 4-bit chunks separately; in each round, the total size of γ is then 16 × 2 bits. The satellite variable γ provides no additional secret information to the attacker because the two inputs are encoded values.
D. Type II Generation and Lookup
To be properly cooperated with the modified Type IV_III tables, the Type II tables must also be modified to correctly decode the input. The conditional re-encoding is independently applied to the upper and lower 4-bit output of a Type IV_III table. For this reason, a Type II table contains four decoding cases, as shown in Table 1 . To look up a correctly decoded input for a Type II table, γ has to be referred. Based on Table 1 , γ can be used as an index to select the decoded value.
Four cases of input decoding in a Type II table, whereγis corresponding value ofϕfor each case.
Y1 Y0 γ
InvE2(X1) InvE2(X0) 00
InvE2(X1) InvE3(InvE2(X0)) 01
InvE3(InvE2(X1)) InvE2(X0) 10
InvE3(InvE2(X1)) InvE3(InvE2(X0)) 11
- 2. Security Analysis
As stated previously, three methods of cryptanalysis depend on Theorem 1 or the collision equation. In turn, Theorem 1 and the collision equation strongly rely on Q . Specifically, Theorem 1 basically uses Q −1 Q ( z ) = z and the collision equation uses the fact that z = z ′ if Q ( z ) = Q ( z ′). Owing to the many-to-many relationship by the conditional re-encoding, Theorem 1 and the collision equation are not acceptable for our proposed scheme. The details are as follows.
A. Against Theorem 1
When we say Q −1 Q ( z ) = z , this means that the output of the non-linear encoding is determined by only its 8-bit value. To be more precise, a particular 8-bit z at the same round, row, column, and pair has to be encoded to a particular output z ′. In addition, z ′ has to be decoded to z . Based on the property of the conditional re-encoding, this is not always guaranteed.
Let us recall (3). An attacker assumed that the table for the non-linear encoding at the same round, row, column, and pair is always the same. However, we know that the non-linear encoding table for the round output is
{ E3 °E2if  ϕ(x,y)=1, E2otherwise,
where x and y are 4-bit inputs to be XORed. Given a particular round, row, column, and pair0, we need to conduct the XOR operations between 8-bit inputs and encode the result for the round output. The following examples illustrate how this disturbs Theorem 1. Table 2 explains the notation to be used.
Notations used.
Notation Usage
E2_high Non-linear encoding table for the upper 4 bits
E2_low For the lower 4 bits
E3_high Re-encoding for the upper 4 bits
E3_low For the lower 4 bits
x0 The lower 4-bit of an 8-bit X
x1 The upper 4-bit of X
δ Non-linear encoding function
ϕ Decision function for re-encoding
γ Satellite variable storing the occurrences of re-encodings for an 8-bit value
Example 1. X = 0x11, Y = 0x34, E2_high (0x2) = 0xA, E2_low (0x5) = 0x1, and E3_high (0xA) = 0x7, E3_low (0x1) = 0xC. The decision function ϕ outputs 1 if the LSB of the first input and the second LSB of the second input are different. The following computation is then depicted in Fig. 6 , and we know that
Z1=X⊕Y=0x25 and γ= 01.
This gives us the final encoded output of 0xAC because the lower 4-bit of Z 1 has to be re-encoded.
PPT Slide
Lager Image
Type IV_III lookups for XOR operations.
Example 2. X = 0xB7, Y = 0x92, and the other conditions are the same; thus, we have
Z2=X⊕Y=0x25 and γ=10.
In this case, the final encoded output is 0x71 because only the upper 4-bit of Z 2 has to be re-encoded.
Example 3. X = 0x6C, Y = 0x41, E2_low (0xD) = 0xC, and the other conditions are the same; thus, we have
Z3=X⊕Y=0x2D and γ= 00.
The final encoded output is then 0xAC because there is no re-encoding. In the first two examples, we showed that the two encoded outputs can be different due to the re-encoding even though Z 1 and Z 2 are the same.
In contrast, the last two examples showed that the two encoded outputs can be the same even though Z 2 and Z 3 are different. The decoding for the next round’s input also has a many-to-many relationship because the input decoding is determined by γ . For this reason, a successful cryptanalysis using Theorem 1 is possible if there is always no re-encoding or always a re-encoding . According to (3) and Theorem 1, an attacker needs S = { Q ◦ ⊕ β Q −1 }( x ), where x ∈ GF(2 8 ). Because we know that the probability that γ = ‘00’ is 1/4, the probability of “ always no re-encoding ” happening is (1/4) 256 . Similarly, the probability of “ always a re-encoding ” is also (1/4) 256 . Thus, there is a negligible probability of constructing a correct S for an attacker who has one of the assumptions, “ always no re-encoding ” or “ always a re-encoding .” Without a correct set of functions S , an attacker is unable to construct a particular solution
Q ˜
in Theorem 1 and therefore the cryptanalysis cannot work.
Without loss of generality, if we compare n bits of xk and yk , where k = 0 or 1, to test the condition for the re-encoding, then there is a 1/2 2n probability of a re-encoding not happening. Therefore, if we compare four bits of xk and yk , a sub-byte of a round output is re-encoded with a 255/256 probability. In this case, the attacker can be convinced that the output was probably encoded by the same table, E3 ◦ E2. The problem then returns to its starting point. For this reason, we recommend comparing a single bit between xk and yk . In addition, one might be curious if the many-to-many relationship of the non-linear encoding lowers the security level of WB-AES. However, the encoding becomes complex because the number of encoding cases is significantly increased.
B. Against Collision Attacks
Lepoint and others’ cryptanalysis needs to find a collision by Q . We now show that the many-to-many relationship by the conditional re-encoding prevents this cryptanalysis from finding such a collision. As pointed out previously, the attacker first tries to recover functions S 0 and S 1 using (4) as follows:
Q 0 (02⊗ S 0 (α)⊕03⊗ S 1 ( 0 )⊕c) = Q 0 (02⊗ S 0 ( 0 )⊕03⊗ S 1 (β)⊕c).
In this cryptanalysis, this means that 02 ⊗ S 0 ( α ) ⊕ 03 ⊗ S 1 (0) = 02 ⊗ S 0 (0) ⊕ 03 ⊗ S 1 ( β ), because the non-linear encoding for the round output guarantees a one-to-one mapping regardless of how the mixing bijection obfuscates each byte.
However, as demonstrated in the previous analysis, even if both sides of the equation are the same, then their encoded outputs can be different owing to the conditional re-encoding. On the contrary, there can be collisions even when
02⊗ S 0 (α)⊕03⊗ S 1 ( 0 )≠02⊗ S 0 ( 0 )⊕03⊗ S 1 (β).
We expect that each intermediate bit becomes 0 or 1 with a 1/2 probability owing to the confusion and diffusion property of S-box, MixColumns, and mixing bijection. In this assumption, half of the ( α , β ) pairs satisfying
02⊗ S 0 (α)⊕03⊗ S 1 ( 0 )=02⊗ S 0 ( 0 )⊕03⊗ S 1 (β)
may not lead to collisions if the re-encoding occurs by a 1-bit comparison. The attacker’s disadvantage here is that it is not allowed to fully recover S 0 . In turn, S 1 , S 2 , and S 3 cannot be recovered without recovering S 0 . Since the attacker is unable to find the correct collision sets, the strategy for the cryptanalysis is not complete for the proposed scheme.
- 3. Size and Performance
We provide additional costs of our proposed algorithm in terms of the memory requirement and execution speed. The only operation left unchanged from the AES implementation of Daemen and Rijmen [11] is ShiftRows while the changed lookup tables from Chow and others’ WB-AES are Type II and IV_III tables. The elapsed time for the table generation is not included in the runtime overhead.
A. Memory Requirement
The total size of the lookup tables is 1,212,416 bytes. A reasonable comparison is Chow and others’ WB-AES, which requires 770,048 bytes for the lookup tables. Only the Type II table increases its size, with a four-fold increase. This is due to the fact that there are four cases of input decoding in a Type II table, as stated previously. As a consequence, our proposed WB-AES requires 1.57-times the size of the lookup tables in total, compared to Chow and others’ WB-AES, shown in Table 3 .
Comparison of table size: dash symbol (−) indicates that proposed algorithm is same size as Chow and others’ algorithm, in unit of bytes.
Type Chow’s Proposed
Type IA 65,536 -
Type IA_IV 61,440 -
Type II 147,456 589,824
Type IV_II 110,592 -
Type III 147,456 -
Type IV_III 110,592 -
Type IB 65,536 -
Type IV_IB 61,440 -
Total 770,048 1,212,416
B. Execution Time
The number of lookups is left unchanged from Chow and others’ WB-AES, but some additional operations are needed to compute and read γ . To compare the runtime of the standard AES implementation, for both Chow and others’ WB-AES and our own, we performed experiments using a desktop PC with an Intel Core i7 CPU at 3.4 GHz. We repeated each algorithm 100 times and measured the number of elapsed CPU ticks during the execution. The average CPU tick is shown in Table 4 . Because the number of clock ticks per second of our PC is 1,000, one CPU tick is equal to 1 ms.
Comparison of runtime. Number of clock ticks per second is 1,000.
Algorithm CPU ticks
AES-128 1
Chow and others 10
Our WB-AES 12
Compared to AES, Chow and others’ WB-AES is approximately ten-times slower, whereas our WB-AES is twelve-times slower. The interesting point here is that we expected a much lower execution time because of a number of lookups compared to AES. This result indicates that other operations, such as an XOR, in AES probably take more time than the lookups.
C. Table Generation
The first step of white-box cryptography is to generate the lookup tables. However, generating the lookup tables in advance is not included in the runtime overhead in our case as previously mentioned. For a comparison, we provide the elapsed time for the table generation of the two WB-AES algorithms. The main reasons for the additional time, as compared to Chow and others’ WB-AES, are as follows:
  • ■ generating additional encoding tables, E3 and InvE3
  • ■ conditional re-encoding of the output of Type IV_III tables
  • ■ generating the new Type II tables, which are four-times bigger than the original type
As a result, our algorithm requires approximately 8,100 ticks, whereas that of Chow and others requires 5,200 ticks on average.
V. Conclusion
To provide a secure WB-AES implementation, we proposed a conditional re-encoding method applied to the round outputs, and changed the corresponding input decoding of the next round. In addition, we described how to properly look up the tables and showed that the encoding variance protects against cryptanalysis. Additional costs compared to Chow and others’ WB-AES are about 1.57-times the total size of the lookup tables and 1.2-times the execution speed.
Algorithm 1. Generating the original Type IV_III. for (rnd = 0; rnd < 16; ++ rnd) {   for (row = 0; row < 4; ++ row) {     for (col = 0; col < 4; ++ col) {       for (pair = 0; pair < 3; ++pair) {       \* offSet distinguishes           the upper and the lower 4 bits of a byte *\         for (offSet = 0; offSet < 2; ++offSet) {           x_dec_table =                     & (InvE1) [rnd] [row] [col] [pair × 2 + offSet] [0];           y_dec_table =                     & (InvE2) [rnd] [row] [col] [offSet] [0];                  if (pair = = 2) {             y_dec_table =                 & InvE1 [rnd] [row] [col] [((pair + 1) × 2) + offSet] [0];           }           for (x = 0; x < 16; ++x) {             for (y = 0; y < 16; ++y) {               x_dec = x_dec_table [x] ;               y_dec = y_dec_table [y] ;               xy_index = x | | y;               xy_xored = x_dec ^ y_dec;                            Type IV [rnd] [row] [col] [pair] [offSet] [xy_index]                                  = E2 [rnd] [row] [col] [offSet] [xy_xored];         }    ... }
Algorithm 1 shows how to generate the original Type IV_III table in [5] . It first decides the proper decoding tables for x and y based on a pair. Then, it decodes all possible 4-bit inputs x and y , and computes the XOR value between them. To encode each resulting value, there is only one encoding table, E2. The encoding is simply done by looking up the values of E2 at the index xy_xored ; the encoded values are stored in Type IV_III. This is simplified in Fig. 7 .
PPT Slide
Lager Image
Simplified flow chart of Algorithm 1. InvE1 and InvE2 are decoding tables, while E2 is an encoding table.
In Section IV, we recommended comparing a single bit of x and y to decide whether to conduct the re-encoding. For better understanding, Fig. 8 simplifies the flow of the generation of Type IV_III if the re-encoding is decided by a single bit comparison between x and y . Algorithm 2 explains the generation of the proposed Type IV_III in detail on the condition that the re-encoding depends on the comparison of the LSB of x and the second LSB of y . For all possible pairs of 4-bit x and y , the proper decoding tables are decided first, and the XOR result is encoded by E2. Based on the result of the comparison, the re-encoding using an additional encoding table E3 is performed.
PPT Slide
Lager Image
Simplified example of proposed encoding method; x(n) is the nth bit of x in binary.
Algorithm 2. Generating the proposed Type IV_III. for (rnd = 0; rnd < 16; ++ rnd) {   for (row = 0; row < 4; ++ row) {     for (col = 0; col < 4; ++ col) {       for (pair = 0; pair < 3; ++pair) {       \* offSet distinguishes           the upper and the lower 4 bits of a byte *\         for (offSet = 0; offSet < 2; ++offSet) {           x_dec_table =                     & (InvE1) [rnd] [row] [col] [pair × 2 + offSet] [0];           y_dec_table =                     & (InvE2) [rnd] [row] [col] [offSet] [0];                  if (pair = = 2) {             y_dec_table =                 & InvE1 [rnd] [row] [col] [((pair + 1) × 2) + offSet] [0];           }           for (x = 0; x < 16; ++x) {             for (y = 0; y < 16; ++y) {               x_dec = x_dec_table [x] ;               y_dec = y_dec_table [y] ;               xy_index = x | | y;               xy_xored = x_dec ^ y_dec;               encoded_output = E2 [rnd] [row] [col] [offSet] [xy_xored];                      \* compare x(1) and y(2) at pair 0 *\               if ((pair = = 0) && ((x & 0x01) ^ (y & 0x02) = = 1))                    encoded_output =                           E3 [rnd] [row] [col] [offSet] [encoded_output];                        Type IV [rnd] [row] [col] [pair] [offSet] [xy_index]                           = encoded_output;         }    ... }
Algorithm 2 shows how to generate the proposed Type IV_III table; here, E3 is an additional encoding table used for the conditional re-encoding.
This work was supported by the K-SCARF project, the ICT R&D program of ETRI (Research on Key Leakage Analysis and Response Technologies).
The input decoding and the output encoding are performed in 4-bit unit with a concatenated form to avoid large tables.
Seungkwang Lee received his BS degree in computer science and electronic engineering from Handong University, Pohang, Rep. of Korea, in 2009 and his MS degree in computer science from Pohang University of Science and Technology, Rep. of Korea, in 2011. He is currently working as a researcher at ETRI. His research interests include cryptography, side-channel analysis, and fault injection attacks.
Corresponding Author dhchoi@
Dooho Choi received his BS degree in mathematics from Sungkyunkwan University, Suwon, Rep. of Korea, in 1994 and his MS and PhD degrees in mathematics from the Korea Advanced Institute of Science and Technology, Daejeon, Rep. of Korea, in 1996 and 2002, respectively. He has been a principal researcher at ETRI since January 2002, and he is also an assistant professor at the University of Science & Technology, Daejeon, Rep. of Korea. His current research interests include cryptographic engineering, including side-channel analysis and its countermeasure design; security technologies of RFID and IoT; and lightweight and secure cryptographic module HW/SW design. He was the editor of the ITU-T Rec. X.1171.
Yong-Je Choi received his BS and MS degrees from Chonnam National University, Gwangju, Rep. of Korea, in 1996 and 1999, respectively. He is currently a senior member of technical staff at ETRI. His research interests include VLSI design, crypto processor design, side-channel analysis, and information security.
Kocher P. , Jaffe J. , Jun B. “Differential Power Analysis,” Int. Cryptology Conf. Adv. Cryptology Santa Barbara, CA, USA Aug. 15–19, 1999 388 - 397
Brier E. , Clavier C. , Olivier F. “Correlation Power Analysis with a Leakage Model,” Cryptographic Hardware Embedded Syst. Cambridge, MA, USA Aug. 11–13, 2004 16 - 29
Lee S. , Choi D. , Choi Y. 2014 “Improved Shamir’s CRT-RSA Algorithm: Revisit with the Modulus Chaining Method,” ETRI J. 36 (3) 469 - 478    DOI : 10.4218/etrij.14.0113.0317
Boneh D. , DeMillo R.A. , Lipton R.J. “On the Importance of Checking Cryptographic Protocols for Faults,” Int. Conf. Theory Appl. Cryptographic Techn. Konstanz, Germany May 11–15, 1997 37 - 51
Chow S. “White-Box Cryptography and an AES Implementation,” Workshop Sel. Areas Cryptography Madrid, Spain Aug. 15–16, 2002 250 - 270
Billet O. , Gilbert H. , Ech-Chatbi C. “Cryptanalysis of a White Box AES implementation,” Int. Conf. Sel. Areas Cryptography Waterloo, Canada Aug. 9–10, 2004 227 - 240
Bringer J. , Chabanne H. , Dottax E. 2006 “White Box Cryptography: Another Attempt,” IACR Cryptology ePrint Archive 2006 468 -
Mulder Y.D. , Roelse P. , Preneel B. “Cryptanalysis of the Xiao-Lai White-Box AES Implementation,” Int. Conf. Sel. Areas Cryptography Windsor, Canada Aug. 15–16, 2012 34 - 49
Mulder Y.D. , Wyseur B. , Preneel B. “Cryptanalysis of a Perturbated White-Box AES Implementation,” Int. Conf. Cryptology India Hyderabad, India Dec. 12–15, 2010 292 - 310
Lepoint T. “Two Attacks on a White-Box AES Implementation,” Int. Workshop Sel. Areas Cryptography Burnaby, Canada Aug. 14–16, 2013 265 - 285
Daemen J. , Rijmen V. 1998 AES Proposal: Rijndael
Muir J.A. 2013 “A Tutorial on White-Box AES,” IACR Cryptology ePrint Archive 104 -
Shannon C. E. 1949 “Communication Theory of Secrecy Systems,” Bell Syst. Techn. J. 28 (4) 656 - 715    DOI : 10.1002/j.1538-7305.1949.tb00928.x
Saremi Jeff White-Box AES Project for Educational Purposes
Michiels W. , Gorissen P. , Hollmann H.D. “Cryptanalysis of a Generic Class of White-Box Implementations,” Int. Conf. Sel. Areas Cryptography Sackville, Canada Aug. 14–15, 2008 414 - 428
Tolhuizen L. “Improved Cryptanalysis of an AES Implementation,” WIC Symp. Inf. Theory Benelux Boekelo, Netherlands May 24–25, 2012
Biryukov A. “A Toolbox for Cryptanalysis: Linear and Affine Equivalence Algorithms,” Int. Conf. Theory Appl. Cryptographic Techn. Warsaw, Poland May 4–8, 2003 33 - 50
Fuller J. , Millan W. “Linear Redundancy in S-Boxes,” Int. Workshop Fast Softw. Encryption Lund, Sweden Feb. 24–26, 2003 74 - 86