Advanced
Low-Power Channel-Adaptive Reconfigurable 4×4 QRM-MLD MIMO Detector
Low-Power Channel-Adaptive Reconfigurable 4×4 QRM-MLD MIMO Detector
ETRI Journal. 2016. Mar, 38(1): 100-111
Copyright © 2016, Electronics and Telecommunications Research Institute (ETRI)
  • Received : February 24, 2015
  • Accepted : September 30, 2015
  • Published : March 01, 2016
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Iput Heri Kurniawan
Ji-Hwan Yoon
Jong-Kook Kim
Jongsun Park

Abstract
This paper presents a low-complexity channel-adaptive reconfigurable 4 × 4 QR-decomposition and M -algorithm-based maximum likelihood detection (QRM-MLD) multiple-input and multiple-output (MIMO) detector. Two novel design approaches for low-power QRM-MLD hardware are proposed in this work. First, an approximate survivor metric (ASM) generation technique is presented to achieve considerable computational complexity reduction with minor BER degradation. A reconfigurable QRM-MLD MIMO detector (where the M -value represents the number of survival branches in a stage) for dynamically adapting to time-varying channels is also proposed in this work. The proposed reconfigurable QRM-MLD MIMO detector is implemented using a Samsung 65 nm CMOS process. The experimental results show that our ASM-based QRM-MLD MIMO detector shows a maximum throughput of 288 Mbps with a normalized power efficiency of 10.18 Mbps/mW in the case of 4 × 4 MIMO with 64-QAM. Under time-varying channel conditions, the proposed reconfigurable MIMO detector also achieves average power savings of up to 35% while maintaining a required BER performance.
Keywords
I. Introduction
Due to higher spectral efficiency with improved link reliability [1] , spatial multiplexing multiple-input and multiple-output (MIMO) technology has been widely adopted in wireless standards such as IEEE 802.11n, IEEE 802.16e, IEEE 802.16m, and Long-Term Evolution.
In the MIMO detector, maximum likelihood (ML) decoding [2] has been considered as one of the best solutions for spatial multiplexing, and it is depth-first approaches such as sphere decoding that show near-optimal performances [3] . However, since a sphere decoding approach involves iteratively checking all possible selections of a detected vector, the complexity of the worst-case scenario exponentially increases with an increasing MIMO channel-matrix size, which results in huge hardware overhead with irregular throughput [4] . For low-complexity ML detection, breadth-first approaches, such as M -algorithm; QR-decomposition and M -algorithm-based maximum likelihood detection (QRM-MLD); and K -Best algorithm with a fixed M (or K )-value [5] , [6] , have been chosen as a reasonable alternative to relieve the hardware burden with guaranteed fixed throughput. However, both the computational complexity and the resulting power consumption of a breadth-first approach are still large, since the complexity of child expansion and sorting schemes exponentially increases as the constellation order or dimension of the MIMO channel-matrix increases [7] [9] .
This paper presents a low-power channel-adaptive hardware architecture for a QRM-MLD MIMO detector. First, a simplified branch metric computation approach in QRM-MLD is presented. Consequently, the computational complexity is significantly reduced without seriously sacrificing the bit error rate (BER) performance.
As a second approach, a channel-adaptive QRM-MLD MIMO detector with reconfigurable M -value is also presented. The hardware implementation results also show that considerable power savings can be achieved by efficiently trading off the detection performance.
II. QRM-MLD MIMO Detection
- 1. MIMO System Model
Let us consider a spatial multiplexing MIMO system with N t transmitting and N r receiving antennas. The equivalent baseband model of the complex MIMO system is described as follows:
(1) y=Hx+n,
where y is an N r × 1 complex-valued received symbol vector, x is an N t × 1 complex-valued transmit vector, H denotes an N r × N t complex-valued channel response matrix, and n represents independent and identically distributed (i.i.d.) complex Gaussian noise, of size N r × 1 and zero mean. To find an optimal solution for spatial multiplexing, the basic operation of the ML MIMO detection is to minimize the norm of the receiver noise as follows:
(2) s=arg min x∈ Ω N t ‖ y−Hx ‖ 2 ,
where Ω is the set of all complex elements in the constellation and Ω Nt denotes all the possible N t -dimensional transmitted symbol vectors.
- 2. Conventional QRM-MLD Approach
To transform the optimization problem of (2) into a tree-search problem, a QR decomposition (QRD) of the channel response matrix H is introduced as follows:
(3) H=QR,
where Q is an N r × N t unitary matrix and R is an N t × N t upper triangular complex matrix. By multiplying both sides of (1) by Q H (the Hermitian conjugate of Q ), the system model can be rewritten as
(4) z= Q H y=Rx+w,
where w = Q H n . With the upper triangular property of matrix R , the norm of the receiver noise in (2) can be reformulated to an accumulated branch metric as (5) below, and the process to find an optimal solution, s , in (2), becomes a tree-search problem, where the maximum level of a node in the tree structure is N t .
(5) ‖ z−Rx ‖ 2 = ∑ i=1 N t | z N t +1−i − ∑ j=1 i r N t +1−i, N t +1−j ⋅ x N t +1−j | 2 ,
where i and j represent the index of the current stage and that of the previous stages in the tree structure, respectively; ri,j denotes the ( i , j )th element of matrix R ; and xj is the j th element of vector x . In the i th stage, the partial Euclidean distance (PED) calculation is formulated in a recursive manner as
(6) PED i ( x (i) )=  PED i –1 ( x (i–1) ) +  | e i ( x (i) ) | 2 ,
(7) e i ( x (i) )= z N t +1−i − ∑ j= N t +1−i N t r N t +1−i ⋅   x j .
Here, PED i ( x (i) ) is a PED in the i th stage with PED 0 ( x (0) ) = 0;
x (i)  =[ x N t −i+1 , x N t −i+2 , … ,  x N t ] 
denotes a partial vector symbol in the i th stage; and | ei ( x (i) )| 2 is the distance increment between two successive nodes in the tree structure. The tree structure–based representation of (6) to (7) is illustrated in Fig. 1 , where the modulation order is n (Ω) = 16 and N t = 4.
PPT Slide
Lager Image
Example of mapping QRM-MLD algorithm to tree structure (16-QAM modulation (n(Ω) = 16), M = 4, Nt = 4).
III. Proposed QRM-MLD MIMO Detection Architecture Based on Branch Metric Computation
In this section, an approximate survivor metric (ASM) generation approach is presented, which can be effectively used to reduce the computational complexity of the QRM-MLD MIMO detector. The hardware implementation results of the proposed ASM-based MIMO detector are also presented to show the power consumption reduction and silicon area.
- 1. ASM Generation
The basic idea of the proposed ASM generation is that the absolute values of the real and imaginary parts can be separately computed and sorted to find survival branches. In the following, (8) and (9) denote the magnitudes of the real and imaginary parts of the approximate branch metric based on the simplified norm algorithm [5] , respectively:
(8) Re{ e i ( x (i) ) }=| Re{ z N t +1−i }−Re{ ∑ j= N t +1−i N t r N t +1−i ⋅ x j } |,
(9) Im{ e i ( x (i) ) }=| Im{ z N t +1−i }−Im{ ∑ j= N t +1−i N t r N t +1−i ⋅ x j } |,
(10) | e i ( x (i) ) |= Re{ e i ( x (i) ) } + Im{ e i ( x (i) ) },
(11) PED i ( x (i) )=  PED i –1 ( x (i–1) ) + | e i ( x (i) ) |,
where PED 0 ( x (0) ) = 0. In the conventional simplified norm algorithm [5] , separate real and imaginary parts are added to calculate the approximate distance increment of the i th stage by (10), and the PED of the i th stage is computed using (11). To reduce the computational complexity of the conventional QRM-MLD approaches [4] , [6] , advanced approaches [7] [9] iteratively find M minimum ABMs. First, the
M
real and
M
imaginary parts in (8) and (9) are calculated. They are then sorted in ascending order. After this, the possible candidates for the n th minimum ABM in (11) are found by combining minimum real and imaginary parts. To find the n th minimum ABM among the candidates, a large number of comparisons are needed, which means that a huge number of comparison operations are required to find all of the M minimum ABMs in the tree structure. To further reduce the number of comparisons, the proposed approach finds M minimum ABMs by judiciously combining
M
minimum real and
M
minimum imaginary parts of results in (8) and (9). In the proposed approach, only
M
minimum real and
M
minimum imaginary parts in (8) and (9) are selected from the sorted
M
real and
M
imaginary parts in (8) and (9). Then, M minimum ABMs are obtained by simply combining
M
minimum real and
M
minimum imaginary parts in (8) and (9). In the first stage of the proposed ASM generation process, by using (12) and (13), only the
M
smallest real and imaginary parts are searched from
M
real and
M
imaginary parts in (8) and (9), respectively.
(12) Re { e 1 ( x (1) ) } 〈 l 〉 = lthmin x N t ∈C | Re{ z N t − r N t , N t ⋅ x N t } | ,  
(13) Im { e 1 ( x (1) ) } 〈 l 〉 = lth  min x N t ∈C | Im{ z N t − r N t , N t ⋅ x N t } |,
(14) | SM (1) <α,β> |= Re { e 1 ( x (1) ) } <α> + Im { e 1 ( x (1) ) } <β> ,
(15) Re{ SM (i) }= min x i ∈C | Re{ e i ( x (i) ) } | ,
(16) Im{ SM (i) }= min x i ∈C | Im{ e i ( x (i) ) } |,
(17) ASM i 〈 α,β 〉( x (i) ) =| SM ( 1 ) 〈 α,β 〉 | +  ∑ i=2 N t | Re{ SM ( i ) } + Im{ SM (i) } |,
where l , α , β = 1, ,
M
, i ≥2. In the equations,
lth min x N t ∈C |  Re  { z N t − r N t , N t ⋅ x N t }  |
and
lth min x N t ∈C |  Im  { z N t − r N t , N t ⋅ x N t }  |
denote the l th smallest magnitude value from
M
real and
M
imaginary parts in the first stage, respectively; C denotes the set {
− n(Ω) +1,
− n(Ω) +3,
… , −1, … ,
− n(Ω)
−1}, where
− n(Ω)
is the square root of an arbitrary constellation of order n (Ω); Re{ e 1 ( x (1) )} <l> and Im{ e 1 ( x (1) )} <l> denote the l th smallest partial (real and imaginary) value ((8) and (9)) in the first stage; and |SM (1) < α , β >| denotes the proposed survivor metric (SM) of a survivor path in the first stage, which is generated by combining the α th smallest real part and the β th smallest imaginary part in (8) and (9). Furthermore, Re{SM (i) } and Im{SM (i) } denote the real and imaginary parts of the SM in the i th stage, respectively, where i ≥ 2; ASM i < α , β > ( x (i) ) is the proposed ASM of a survivor path in the i th stage, where i ≥ 2, which is the descendant of the SM in the first stage, |SM (1) < α , β >|. Assuming N t = 4, the QRM-MLD process based on the proposed ASM generations is illustrated in Fig. 2 . First, separate real and imaginary parts in (8) and (9) are computed at the beginning of stage 1. After computing the real part in (8) for all the possible four (
n(Ω)
= 4) cases, two survival branches
( M =2)
with the smallest partial (real) SMs (Re{ e 1 ( x (1) )} <2> = 2, Re{ e 1 ( x (1) )} <1> = 1) are selected, which is expressed in (12). In the second part of stage 1 (imaginary), (13) is performed in a similar manner with (12). Then, the real and imaginary parts are combined (14) for generating the four smallest SMs of the first stage, where the results are |SM (1) <2, 1>| = 3, |SM (1) <2, 2>| = 5, |SM (1) <1, 1>| = 2, and |SM (1) <1, 2>| = 4. Here, only M addition operations are needed to generate M number of SMs by combining
M
real and
M
imaginary parts. Please note that M number of SMs can be generated without any sorting operations for the ABMs. In the second stage, for each SM of the first stage, only the minimum real (Re{SM (2) }) and minimum imaginary (Im{SM (2) }) ABMs are selected among
n(Ω)
children, which is presented in (15) and (16) for real and imaginary parts of ABMs, respectively. After the second stage, four ( M = 4) ASMs (ASM 2 <2, 1>( x (2) ) = 6, ASM 2 <2, 2>( x (2) ) = 6, ASM 2 <2, 2>( x (2) ) = 8, and ASM 2 <1, 1> ( x (2) ) = 5) are generated, and a similar process is repeated for each remaining stage (stages three and four). As a result, the minimum ASM, ASM 4 <1, 1>( x (4) ) (= 2 + 3 + 3 + 2 = 10), is selected as an optimal solution, s .
PPT Slide
Lager Image
Example of proposed branch metric computations for case M = 4, n(Ω) = 16, and Nt = 4.
In terms of computational complexity, the proposed ASM generation–based 4 × 4 QRM-MLD is compared with a conventional 4 × 4 QRM-MLD [6] , [7] (see Fig. 3 ). Compared to [6] , the number of multiplications and additions with the L1-norm-based approaches (the proposed and [7] ) are significantly reduced. Figure 4 shows the BER comparisons between the proposed approach and the conventional breadth-first-search-based ML MIMO detectors [6] , [7] , [9] , [10] , with 64-QAM modulation using a fixed-point simulation. For all the plots, the input bit-width (real and imaginary parts of z and matrix R ) is set to 15 bit, where integer and fractional parts are 6 bit and 9 bit, respectively. The results in Fig. 4 show that the proposed architecture demonstrates a comparable BER with the conventional approach (with K = 10) [7] , when M = 16. For iso-BER comparison, if the proposed architecture ( M = 16) is compared with the conventional one (with K = 10) [7] in terms of computational complexity, then the proposed approach shows 25% savings on the number of multiplications with increasing number of comparisons and additions. The BER performances for QPSK and 16-QAM are also presented in Fig. 4 .
PPT Slide
Lager Image
Computational complexity comparisons for 4 × 4 MIMO detectors.
PPT Slide
Lager Image
BER comparisons among proposed approach and conventional breadth-first-search-based ML MIMO detectors [6], [7], [9], [10].
- 2. QRM-MLD Architecture Based on ASM Generation
Figure 5 illustrates the proposed QRM-MLD MIMO detector architecture ( M = 25) based on ASM generation with 4 × 4 MIMO multiplexing with 64-QAM constellation. A timing diagram of the proposed architecture is illustrated in Fig. 6 . In stage 1 ( i = 1), to compute Re{ ei ( x (i) )} and Im{ ei ( x (i) )} shown in (8) and (9), respectively, a constant multiplier (MUL_0), PED I, and PED II (PED_II_0) in Fig. 5 are first operating. The outputs of MUL_0 are r 4,4 · k , where k is selected from the set { a , b , c , d }. Here, a , b , c , and d are four different magnitudes of the real and imaginary parts for 64-QAM symbols [11] . Since b , c , and d are multiples of constant a [11] , the multiplier array (MA) can be simply implemented using adders and shifters, as presented in Fig. 7(a) . PED I performs the following y 4 r 4,4 · k operations with eight real and eight imaginary cases, which is shown in Fig. 7(b) . Here, since the diagonal component of matrix R , r 4,4 , is a real number [12] , MUL_0, outputs ( r 4,4 · k ) can be shared for both of the real and imaginary parts in PED I. In PED_II_0, only muxes and absolute (ABS) value computation modules are operating in stage 1, as shown in Fig. 7(c) ; the subtractor (SUB) modules bypass the eight real and eight imaginary PED I outputs. Using the eight real and eight imaginary outputs from PED_II_0, SORF_0 and SORF_1 inside the FMIN array sort five real and five imaginary outputs in ascending order. After the sorting operation is done in two clock cycles, the outputs of SORF_0 and SORF_1 are stored in the ordered path register. Finally, the separate part merging adders (SMA) generate 25 cases of |SM (1) < α , β >| in (14) by adding all the combinations of five real (Re{ e 1 ( x (1) )} <α> ) and five imaginary (Im{ e 1 ( x (1) )} <α> ) parts, where α , β = 1, 2, … , 5.
PPT Slide
Lager Image
Overall architecture of proposed QRM-MLD MIMO detector.
PPT Slide
Lager Image
Timing diagram of proposed QRM-MLD MIMO detector.
PPT Slide
Lager Image
Submodules of proposed QRM-MLD MIMO detector: (a) constant multiplier, (b) PED I, and (c) PED II module and turn-off pattern of subtractors (SUB), adders (ADD0, ADD1) in PED II.
In stage 2 ( i = 2), three multipliers (MUL_0 to MUL_2), PED I, and all the PED II modules (PED_II_0 to PED_II_4) are operating to compute Re{SM(3)} and Im{SM(3)} according to (15) and (16), respectively. At first, r 3,3 ·Re{ x 3 } and r 3,3 ·Im{ x 3 } are calculated by MUL_0; Re{ r 3,4 · x 4 } and Im{ r 3,4 · x 4 } are computed by MUL_1 and MUL_2, respectively. Then, eight cases of Re{ y 3 }− r 3,3 ·Re{ x 3 } and Im{ y 3 }− r 3,3 ·Im{ x 3 } are computed in PED I, and the outputs of PED I and real/imaginary part generator are sent to five PED IIs. Here, real and imaginary components of x 4 are selected from the set {−a, −b, −c, −d, a, b, c, d} based on the value saved in the ordered path register, and five cases of Re{ r 3,4 · x 4 } are distributed to each PED II. Similar processes are repeatedly performed in the PED II array by varying x4 to generate different Re{ r 3,4 · x 4 } values for five clock cycles, as illustrated in the detailed timing diagram ( Fig. 6 ).
In the adder array, branch metrics accumulation (BMA) units add the branch metrics of the previous and current stages for deciding the output vector of the final stage. The sorted branch metrics of stage 2 are stored in the path register to prepare stage 3.
In stage 3 ( i = 3), the overall computation process is similar to that of stage 2, except for the operations in the MA specified in (11) and (12). In stage 4 ( i = 4), all seven MUL modules (seven constant multipliers to compute r 1,1 ·Re{ x 1 }, Re{ r 1,2 · x 2 }, Im{ r 1,2 · x 2 }, Re{ r 1,3 · x 3 }, Im{ r 1,3 · x 3 }, Re{ r 1,4 · x 4 }, and Im{ r 1,4 · x 4 } in MA and the whole PED II are operating. Finally, the hard decision module is used to sort the ASMs in (17) for an output decision.
In the proposed QRM-MLD detector, the operating modules at each stage are specified in Table 1 . The non-active gray-colored parts shown in Fig. 7(c) and Fig. 8 can be simply turned off using the turning-off gate logic (TOGL) shown in Fig. 9 [13] . In the TOGL shown in Fig. 9 , both of the pull-up (PMOS) and pull-down (NMOS) transistors are being used together. When φ is set to one, the pull-down NMOS transistor forces the outputs of the turned-off modules to zero to save dynamic power consumption and to correct functionality. The φ signals are generated from control logic to turn-off the non-active gray-colored parts in Figs. 7(c) and 8 .
Specifications of active modules at each stage in proposed MIMO detector.
Stage 1 2 3 4
MA MUL_0 Active
MUL_1 - Active
MUL_2 - Active
MUL_3 - Active
MUL_4 - Active
MUL_5 - Active
MUL_6 - Active
PED II array (index) 0 0, 1, 2, 3, 4
FMIN array SORF (mode) Sorter FMIN
SFMIN (index) - 0, 1, 2, 3, 4, 5, 6, 7
Adder array All SMA All SMA and BMA
Hard decision - Active
PPT Slide
Lager Image
SORF module in proposed architecture.
PPT Slide
Lager Image
TOGL [11] applied to proposed architecture.
The proposed QRM-MLD-based 4 × 4 MIMO detectors are implemented using a Samsung 65 nm CMOS standard library. The power consumption is simulated with a gate-level netlist using Primetime-PX [14] with an operation frequency of 100 MHz, 1.2 V supply voltage. One hundred thousand random input test vectors of the matrix R and Q H y are used for measuring power. According to the numerical results, the proposed QRM-MLD MIMO detector shows a maximum throughput of 288 Mbps with a normalized power efficiency of 10.18 Mbps/mW. Table 2 shows the hardware comparisons of various 4 × 4 MIMO detectors in the literature. In the comparisons, the proposed architecture shows the lowest power (100 MHz, scaled to 65 nm) and smallest area among the other works. In terms of detection performance in Fig. 4 , the proposed approach with M = 25 shows comparable or even better BER simulation results compared to the conventional K -best approach with K = 10 [7] . Compared to [15] , the proposed approach with M = 25 shows slightly worse but comparable results in the BER simulation.
Hardware comparison of 4 × 4 MIMO detectors.
Architecture [3] [7] [8] [9] [15] This work
Process (nm) 65 130 130 65 130 65
Algorithm Sphere decoding K-Best K-Best K-Best Modified K-Best QRM-MLD
Modulation 64 64 64 64 64 64
K = M 64 10 64 12 10 25
Area (gate count) 1,760K 114K 280K 320K 340K 103K
Oper. freq./scaled to 65 nm (MHz) 158 282/564 270/540 641 417/834 278
Power @Oper. freq. (mW) 165 135 94 165 1,700 28.3
Power*@100 MHz/scaled to 65 nm (mW) 104.4 47.9/23.9 34.8/17.4 25.74 407.7/203.9 10.177
Max. throughput/scaled to 65 nm (Mbps) 100 675/1,350 8.57/17.14 1282 1,000/2,000 288
* For the power consumption with technology scaling, constant voltage scaling is assumed [16]. † Power consumption is estimated using the gate-level netlist simulation.
IV. Low-Power Channel-Adaptive Reconfigurable QRM-MLD MIMO Detector
In this section, a low-power channel-adaptive QRM-MLD MIMO detector with reconfigurable M -value is proposed to further reduce the detector power consumption. Since the BER performance of a QRM-MLD MIMO detector is quite dependent on the selection of the M -value, the M -value is usually decided according to the worst-case channel. However, in most cases, the wireless channel conditions always fluctuate over time, and they are monitored and estimated in the communication system [17] . Based on this interesting observation, the proposed reconfigurable MIMO detector architecture can dynamically change the M -value depending on channel conditions to more aggressively reduce the power computation while satisfying BER performance requirements.
- 1. Hardware Architecture of Proposed Reconfigurable QRM-MLD-Based MIMO Detector
In the QRM-MLD MIMO detector architecture presented in the previous section, all of the modules in the PED II array, FMIN array, and Adder array are operating in parallel, as displayed in Fig. 6 . When
M
= 5, the reconfigurable architecture of Fig. 10(a) is operating in the same way as the one shown in Fig. 5 . When the
M
-value becomes smaller (4, 3, 2, 1), some parts of the detector are not needed, and those unnecessary parts with smaller M can be dynamically turned off to save computation energy. For example, when M is reduced to 16 (
M
= 4), the computation process of stage 1 is the same with the case of
M
= 5. However, only four elements from the set {−a, −b, −c, −d, a, b, c, d} are selected as the survivor metrics x 4 for both the real and imaginary parts at the end of stage 1. Since the Re{ r 3,4 · x 4 } and Im{ r 3,4 · x 4 } results are sent to four PED IIs in stage 2, only four paths are needed among five parallel paths. To turn off the unnecessary data path, the TOGL shown in Fig. 9 is used at the input of the PED II array. As illustrated in Fig. 10(b) , while
M
= 5, all the data paths are operating. When M is reduced to nine (
M
= 3), only three paths are working, and the other two data paths are turned off using TOGL3 and TOGL4. Finally, when M = 1 (
M
= 1), only one path of hardware (TOGL0) is operating.
As shown in the proposed reconfigurable MIMO detector, once the
M
-value is decided, the corresponding number of PEDs are operating according to Fig. 10(a) . Here, the proposed reconfigurable QRM-MLD MIMO detector is operating by the same pattern with Fig. 6 , which means that the architecture shows a fixed throughput regardless of the
M
-value. In the
M
number of PEDs,
M
number of real and imaginary ABMs are simultaneously working, and the results are combined to generate M number of survivor metrics ASMi< α , β >( x (i) ) for the i th stage, as shown in (17). The control logics like the enable signal generator and the reconfigurable scheduler are also implemented in the proposed reconfigurable QRM-MLD MIMO detector. The hardware area and power consumption comparison of the proposed reconfigurable architecture and the original architecture with fixed M (
M
= 5) are presented in Table 3 . According to our simulation results with reconfigurable MIMO detector, the area overhead due to the proposed turning-off scheme and control logics is 9.82% (11,000 gate count), and power consumption overhead is 13.95% (4.5 mW) compared to the original design without turning-off modules.
PPT Slide
Lager Image
Proposed reconfigurable QRM-MLD architecture: (a) block diagram of turning-off gating path and (b) turn-off pattern of turning-off gate with variable M -value.
Comparison between fixed and reconfigurable architectures.
Architecture Process (nm) Area (gate count) Power consumption (mW) @ Max. frequency (278 MHz)
Reconfigurable ( M = 1−5) 65 114K ( M = 5) 32.80
( M = 4) 27.25
( M = 3) 21.85
( M = 2) 16.99
( M = 1) 12.37
- 2. Numerical Result of Proposed Reconfigurable QRM-MLD-Based MIMO Detector
The BER performance of the proposed reconfigurable 4 × 4, 64-QAM QRM-MLD-based MIMO detector is estimated with the following simulation setup:
  • QRD for the QRM-MLD-based MIMO detection is based on the Gram–Schmidt algorithm[12].
  • The minimum accumulated branch metric is decided based on a hard decision algorithm.
  • As a noise model, additive white Gaussian complex random noise is used.
Figure 11(a) illustrates the BER performance comparison of the proposed QRM-MLD MIMO detection algorithm with variable
M
. Since the proposed MIMO detector shows a wide range of BER performance with varying
M
-value, the proposed architecture can be efficiently adapted under varying channel conditions. Figure 11(b) also shows the power consumption and energy efficiency (pJ/bit) of the proposed reconfigurable detector with different
M
-value at the maximum operating frequency of 278 MHz. As shown in the figure, the power savings range from 17% to 62% when
M
changes.
PPT Slide
Lager Image
Experimental results of proposed variable M QRM-MLD architecture: (a) fixed-point BER performance with 64-QAM and (b) power consumptions (mW) and energy efficiencies with variable M (pJ/bit).
V. Experimental Results under Time-Varying Channel Conditions
In this section, the proposed reconfigurable QRM-MLD-based MIMO detector can be effectively used to save power while maintaining the required BER performance under two varying channel conditions. The following gives a detailed description.
- 1.M-Value Decision Process
Figure 12(a) illustrates the
M
-value decision process. The
M
-value decision process under varying channel conditions is displayed in the dotted box. Initially, the SNR of the target channel is estimated. To adaptively update the
M
-value according to the estimated SNR, the
M
-value generator first locates the SNR range. Based on the located SNR range, the best
M
-value is decided using a
M
mapping table. The SNR to
M
mapping table is made off-line.
PPT Slide
Lager Image
(a) M -value decision process of proposed reconfigurable QRM-MLD-based MIMO detector and (b) timing diagram of proposed reconfigurable MIMO detector.
- 2. SNR Estimation Based onM-Selection Process in Reconfigurable MIMO Detection
For the proposed reconfigurable MIMO detector to be efficiently used in the communication system with time-varying channel, a timing diagram including the M -selection process is illustrated in Fig. 12(b) . A MIMO detection process is performed on the N data symbols (D 0 , D 1 , … , D N−1 ). Since both operations are using preamble data, the SNR estimation [18] followed by M-selection operation (Latency: T SNR + TM ) can be processed in parallel with the first H -matrix estimation and QRD process [19] (Latency: T H,R = T H-matrix + T QRD ), as presented in Fig. 12(b) . Here, for the seamless real-time operation of the proposed reconfigurable MIMO detector, the timing constraint of (18) has to be strictly satisfied, where T Frame duration is the preamble period and N H,R denotes the number of H -matrix estimations & QRD processes during a frame duration, as shown in Fig. 12(b) .
(18) T Frame duration ≥ T Preprocessing + T MIMO ,
(19) T Preprocessing ≥ max( T H ,R , T SNR + T M ) + ( N H ,R – 1 ) T H ,R .
In (19), since T SNR estimation is generally larger than T H,R [20] , [21] , T Preprocessing becomes T SNR estimation + T M decision . When T Frame duration is 5.0 ms following standard [11] , 3.0 ms [22] is enough to finish the MIMO detection process ( T MIMO ) of data symbols D 0 , D 1 , … , D N−1 , which include every unit symbol of a subchannel inside downlink durations. In this scenario, T SNR estimation + TM equals 18.5 μs, while T H,R equals about 1.987 μs [20] , [21] ; N H,R is estimated to be equal to four (= ⌊ T Frame duration / TC ⌋), where the coherence time TC is equal to 1.1 ms assuming the worst-case scenario in [22] . Since T Preprocessing is approximately 26.448 μs, the timing constraint (18) is easily satisfied. The constraint is still met when T Frame duration is around 2.5 ms [11] , since T MIMO reduces proportionately due to smaller number data symbols (D 0 , D 1 , … , D N−1 ).
- 3. Experimental Results
In this section, the dynamic reconfiguration of the proposed MIMO detector is demonstrated to trade off BER performance and power savings with an arbitrarily varying channel. The channel condition ( E b / N 0 ) with time is modeled by a normally distributed random variable with typical standard deviation [23] , and the detailed simulation specifications are as follows:
  • ■ For a 10 MHz channel bandwidth, the number of occupied subcarriers is 1,024[11].
  • ■ The frame duration time is set to 5 ms[11]and the simulation time is 3 s, which means the simulation covers 600 frames in total.
  • ■ The frame start preamble used to obtain the average SNR estimate consists of a 32-symbol sequence generated by repeating a 16-symbol CAZAC sequence[11].
  • ■ The acceptable lower bound of the constant BER value is decided to be 10−3[24]for the whole range of theEb/N0in every frame duration.
  • ■ According to the average SNR estimate, theM-value is dynamically decided as the lowestM-value that can satisfy the BER constant (10−3).
The proposed reconfigurable MIMO detector is simulated with an AWGN channel and Rayleigh fading channel, and the simulation results for the Rayleigh fading channel are presented in Fig. 13 . In the proposed approach, the
M
-value can be adaptively changing according to the channel conditions, which results in significant power consumption reduction, as shown in Fig. 13(d) . Although the BER performance can be degraded due to the modified
M
-value, as illustrated in Fig. 13(b) , it always satisfies the BER constraint of 10 −3 . One of the advantages of the proposed approach is that a reasonable trade-off between power consumption and BER performance can be dynamically achieved while satisfying the predecided BER constraint. Table 4 summarizes the power saving of the proposed architecture.
PPT Slide
Lager Image
Simulation results of proposed and conventional QRM-MLD architectures for time-varying Rayleigh channel: (a) Rayleigh channel, (b) BER comparison, (c) M -value selection for varying time, and (d) power consumption with varying time.
Power savings compared to conventional fixedM-based architecture.
Modeled channel Power saving (%)
AWGN channel 35%
Rayleigh fading channel with four numbers of fading and 6.4 Hz of Doppler frequency [25] 32%
VI. Conclusion
In this paper, a low-power channel-adaptive reconfigurable QRM-MLD MIMO detector architecture is presented. The power optimization of the proposed MIMO detector is achieved by two novel approaches. First, an ASM generation is proposed by separating real and imaginary parts of the branch metric to reduce the computational complexity with a minor BER performance degradation. To further reduce the power consumption of the proposed ASM-based MIMO detector under time-varying channel conditions, second, a reconfigurable QRM-MLD approach with variable M is proposed to overcome the limitation of the conventional architecture with a fixed M . The proposed approach shows a reasonable trade-off between system performance and power savings with varying channel conditions. According to the experimental results from our implementation, the proposed reconfigurable MIMO detector achieves power savings of at least 32% with time-varying channel conditions while satisfying the BER performance requirement. The idea presented in this paper can assist in the design of MIMO detector algorithms and their implementation in low-power applications.
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2012R1A2A2A01012471 and No. 2011-0020128)
BIO
iputkurniawan@gmail.com
Iput Heri Kurniawan received his BS degree in electrical engineering and informatics from the School of Electrical Engineering and Information, Bandung Institute of Technology, Indonesia, in 2008 and his MS degree in electrical engineering from Korea University, Seoul, Rep. of Korea, in 2013. From 2008 to 2011, he was with PT, Xirka, Indonesia, as a research engineer in digital circuit design for OFDMA baseband. Since 2013 until now, he has been with Raon-Tech, Seongnam, Rep. of Korea, as a research engineer in digital circuit designs for digital front-ends, FM basebands, and haptic drivers. His research interests include low-power MIMO detectors.
improma@korea.ac.kr
Ji-Hwan Yoon received his BS degree in electrical engineering from Korea University, Seoul, Rep. of Korea, in 2009. He is currently pursuing his MS and PhD degrees with the Department of Electrical and Computer Engineering, Korea University. His interests include MIMO detector design, QR decomposition module design, and ultra-low-power system design.
jongkook@korea.ac.kr
Jong-Kook Kim received his BS degree in electronic engineering from Korea University, Seoul, Rep. of Korea, in 1998 and his MS and PhD degrees in electrical and computer engineering from the School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA, in 2000 and 2004, respectively. He is currently an associate professor at the School of Electrical Engineering, Korea University, where he joined in 2007. He was with Samsung SDS’s IT R&D Center, Seongnam, Rep. of Korea, from 2005 to 2007. His research interests include heterogeneous distributed computing, energy-aware computing, resource management, evolutionary heuristics, distributed mobile computing, neural networks, and distributed robot systems. He is a senior member of the IEEE and ACM.
Corresponding Author jongsun@korea.ac.kr
Jongsun Park received his BS degree in electronics engineering from Korea University, Seoul, Rep. of Korea, in 1998 and his MS and PhD degrees in electrical and computer engineering from Purdue University, West Lafayette, IN, USA, in 2000 and 2005, respectively. He joined the Electrical Engineering Faculty of Korea University, in 2008. From 2005 to 2008, he was with the Signal Processing Technology Group, Marvell Semiconductor Inc., Santa Clara, CA, USA. He was also with the Digital Radio Processor System Design Group, Texas Instruments, Dallas, USA, in the summer of 2002. His research interests focus on variation-tolerant, low-power, and high-performance VLSI architectures, and circuit designs for digital signal processing and digital communications.
References
Huang C. , Yu C. , Ma H. 2009 “A Power-Efficient Configurable Low-Complexity MIMO Detector,” IEEE Trans. Circuits Syst. I: Reg. Papers 56 (2) 485 - 496    DOI : 10.1109/TCSI.2008.2001368
Agrell E. 2002 “Closest Point Search in Lattices,” IEEE Trans. Inf. Theory 48 (8) 2201 - 2214    DOI : 10.1109/TIT.2002.800499
Mondal S. 2010 “Design and Implementation of a Sort-Free K-Best Sphere Decoder,” IEEE Trans. Very Large Scale Integr. Syst. 18 (10) 1497 - 1501    DOI : 10.1109/TVLSI.2009.2025168
Dai Y. , Sun S. , Lei Z. 2005 “A Comparative Study of QRD-M Detection and Sphere Decoding for MIMO-OFDM Systems,” IEEE Int. Symp. PIMRC Berlin, Germany 186 - 190
Wenk M. “K-Best MIMO Detection VLSI Architectures Achieving up to 424 Mbps,” IEEE Int. Symp. Circuits Syst., Island of Kos Greece May 21–24, 2006 1151 - 1154
Kim K. 2005 “A QRD-M/Kalman Filter-Based Detection and Channel Estimation Algorithm for MIMO-OFDM Systems,” IEEE Trans. Wireless Commun. 4 (2) 710 - 721    DOI : 10.1109/TWC.2004.842951
Shabany M. , Gulak P. 2012 “A 675 Mbps, 4×4 64-QAM K-Best MIMO Detector in 0.13 μm CMOS,” IEEE Trans. Very Large Scale Integr. Syst. 20 (1) 135 - 147    DOI : 10.1109/TVLSI.2010.2090367
Chen S. , Zhang T. , Xin Y. 2007 “Relaxed K-Best MIMO Signal Detector Design and VLSI Implementation,” IEEE Trans. Very Large Scale Integr. Syst. 15 (3) 328 - 337    DOI : 10.1109/TVLSI.2007.893621
Khairy M.S. 2014 “Algorithms and Architectures of Energy-Efficient Error-Resilient MIMO Detectors for Memory-Dominated Wireless Communication Systems,” IEEE Trans. Circuits Syst. I: Reg. Papers 61 (7) 2159 - 2171    DOI : 10.1109/TCSI.2014.2298273
Huang M.Y. , Tsai P.Y. 2014 “Toward Multi-Gigabit Wireless: Design of High-Throughput MIMO Detectors with Hardware-Efficient Architecture,” IEEE Trans. Circuits Syst. I: Reg. Papers 61 (2) 613 - 624    DOI : 10.1109/TCSI.2013.2284189
2012 IEEE Standard for Air Interface for Broadband Wireless Access Systems, IEEE Standard 97266 New York, NY, USA
Singh C.K. , Sushma H.P. , Balsara P.T. “VLSI Architecture for Matrix Inversion Using Modified Gram-Schmidt Based QR Decomposition,” IEEE Int. Conf. VLSI Des. Bangalore, India Jan. 6–10, 2007 836 - 841
Park J. , Choi J. , Roy K. 2010 “Dynamic Bit-Width Adaptation in DCT: An Approach to Trade off Image Quality and Computation Energy,” IEEE Trans. VLSI Syst. 18 787 - 793    DOI : 10.1109/TVLSI.2009.2016839
Synopsys PrimeTime User’s Manual http://www.synopsys.com
Mahdavi M. , Shabany M. 2013 “Novel MIMO Detection Algorithm for High-Order Constellations in the Complex Domain,” IEEE Trans. VLSI Syst. 21 834 - 847    DOI : 10.1109/TVLSI.2012.2196296
Borkar S. 1999 “Design Challenges of Technology Scaling,” IEEE Micro 19 (4) 23 - 29
Proakis J.G. 2000 “Digital Communication,” 4th ed. McGraw-Hill New York, USA
Zivkovic M. , Mathar R. “An Improved Preamble-Based SNR Estimation Algorithm for OFDM Systems,” IEEE Int. Symp. PIMRC Istanbul, Turkey Sept. 26–29, 2010 172 - 176
Burg A. 2006 “Algorithm and VLSI Architecture for Linear MMSE Detection in MIMO-OFDM Systems,” IEEE Int. Symp. Circuits Syst. Island of Kos, Greece 21 - 24
Darji A.D. , Patil M.S. “VLSI Implementation of Balanced Binary Tree Decomposition Based 2048-Point FFT/IFFT Processor for Mobile WI-Max,” Int. Conf. Emerg. Trends Eng. Technol. Goa, India Nov. 19–21, 2010 745 - 748
Cheng S. , Evans J.B. 1997 “Implementation of Signal Power Estimation Methods,” IEEE Trans. Circuits Syst. II: Analog Digit. Signal Process. 44 (3) 240 - 250    DOI : 10.1109/82.558458
2011 WiMAX Forum Proprietary, Requirements for WiMAX Coexistence with LTE Network, WMF-T31-132-v01 Clackamas
Zaidi Z.R. , Mark B.L. 2005 “Real-Time Mobility Tracking Algorithms for Cellular Networks Based on Kalman Filtering,” IEEE Trans. Mobile Comput. 4 (2) 195 - 208    DOI : 10.1109/TMC.2005.29
Kermoal J.P. 2002 “A Stochastic MIMO Radio Channel Model with Experimental Validation,” IEEE J. Sel. Areas Commun. 20 (6) 1211 - 1226    DOI : 10.1109/JSAC.2002.801223
2002 MIMO Rapporteur: 3GPP TSG R1-02-0141