Low-Power Channel-Adaptive Reconfigurable 4×4 QRM-MLD MIMO Detector

ETRI Journal.
2016.
Feb,
38(1):
100-111

- Received : February 24, 2015
- Accepted : September 30, 2015
- Published : February 01, 2016

Download

PDF

e-PUB

PubReader

PPT

Export by style

Article

Metrics

Cited by

TagCloud

This paper presents a low-complexity channel-adaptive reconfigurable 4 × 4 QR-decomposition and
M
-algorithm-based maximum likelihood detection (QRM-MLD) multiple-input and multiple-output (MIMO) detector. Two novel design approaches for low-power QRM-MLD hardware are proposed in this work. First, an approximate survivor metric (ASM) generation technique is presented to achieve considerable computational complexity reduction with minor BER degradation. A reconfigurable QRM-MLD MIMO detector (where the
M
-value represents the number of survival branches in a stage) for dynamically adapting to time-varying channels is also proposed in this work. The proposed reconfigurable QRM-MLD MIMO detector is implemented using a Samsung 65 nm CMOS process. The experimental results show that our ASM-based QRM-MLD MIMO detector shows a maximum throughput of 288 Mbps with a normalized power efficiency of 10.18 Mbps/mW in the case of 4 × 4 MIMO with 64-QAM. Under time-varying channel conditions, the proposed reconfigurable MIMO detector also achieves average power savings of up to 35% while maintaining a required BER performance.
M
-algorithm; QR-decomposition and
M
-algorithm-based maximum likelihood detection (QRM-MLD); and
K
-Best algorithm with a fixed
M
(or
K
)-value
[5]
,
[6]
, have been chosen as a reasonable alternative to relieve the hardware burden with guaranteed fixed throughput. However, both the computational complexity and the resulting power consumption of a breadth-first approach are still large, since the complexity of child expansion and sorting schemes exponentially increases as the constellation order or dimension of the MIMO channel-matrix increases
[7]
–
[9]
.
This paper presents a low-power channel-adaptive hardware architecture for a QRM-MLD MIMO detector. First, a simplified branch metric computation approach in QRM-MLD is presented. Consequently, the computational complexity is significantly reduced without seriously sacrificing the bit error rate (BER) performance.
As a second approach, a channel-adaptive QRM-MLD MIMO detector with reconfigurable
M
-value is also presented. The hardware implementation results also show that considerable power savings can be achieved by efficiently trading off the detection performance.
N
_{t}
transmitting and
N
_{r}
receiving antennas. The equivalent baseband model of the complex MIMO system is described as follows:
y
is an
N
_{r}
× 1 complex-valued received symbol vector,
x
is an
N
_{t}
× 1 complex-valued transmit vector,
H
denotes an
N
_{r}
×
N
_{t}
complex-valued channel response matrix, and
n
represents independent and identically distributed (i.i.d.) complex Gaussian noise, of size
N
_{r}
× 1 and zero mean. To find an optimal solution for spatial multiplexing, the basic operation of the ML MIMO detection is to minimize the norm of the receiver noise as follows:
^{Nt}
denotes all the possible
N
_{t}
-dimensional transmitted symbol vectors.
H
is introduced as follows:
Q
is an
N
_{r}
×
N
_{t}
unitary matrix and
R
is an
N
_{t}
×
N
_{t}
upper triangular complex matrix. By multiplying both sides of (1) by
Q
^{H}
(the Hermitian conjugate of
Q
), the system model can be rewritten as
w
=
Q
^{H}
n
. With the upper triangular property of matrix
R
, the norm of the receiver noise in (2) can be reformulated to an accumulated branch metric as (5) below, and the process to find an optimal solution,
s
, in (2), becomes a tree-search problem, where the maximum level of a node in the tree structure is
N
_{t}
.
i
and
j
represent the index of the current stage and that of the previous stages in the tree structure, respectively;
r_{i,j}
denotes the (
i
,
j
)th element of matrix
R
; and
x_{j}
is the
j
th element of vector
x
. In the
i
th stage, the partial Euclidean distance (PED) calculation is formulated in a recursive manner as
_{i}
(
x
^{(i)}
) is a PED in the
i
th stage with PED
_{0}
(
x
^{(0)}
) = 0;
i
th stage; and |
e_{i}
(
x
^{(i)}
)|
^{2}
is the distance increment between two successive nodes in the tree structure. The tree structure–based representation of (6) to (7) is illustrated in
Fig. 1
, where the modulation order is
n
(Ω) = 16 and
N
_{t}
= 4.
Example of mapping QRM-MLD algorithm to tree structure (16-QAM modulation (n (Ω) = 16), M = 4, N _{t} = 4).
_{0}
(
x
^{(0)}
) = 0. In the conventional simplified norm algorithm
[5]
, separate real and imaginary parts are added to calculate the approximate distance increment of the
i
th stage by (10), and the PED of the
i
th stage is computed using (11). To reduce the computational complexity of the conventional QRM-MLD approaches
[4]
,
[6]
, advanced approaches
[7]
–
[9]
iteratively find
M
minimum ABMs. First, the
n
th minimum ABM in (11) are found by combining minimum real and imaginary parts. To find the
n
th minimum ABM among the candidates, a large number of comparisons are needed, which means that a huge number of comparison operations are required to find all of the
M
minimum ABMs in the tree structure. To further reduce the number of comparisons, the proposed approach finds
M
minimum ABMs by judiciously combining
M
minimum ABMs are obtained by simply combining
l
,
α
,
β
= 1, ,
i
≥2. In the equations,
l
th smallest magnitude value from
C
denotes the set {
n
(Ω); Re{
e
_{1}
(
x
^{(1)}
)}
^{<l>}
and Im{
e
_{1}
(
x
^{(1)}
)}
^{<l>}
denote the
l
th smallest partial (real and imaginary) value ((8) and (9)) in the first stage; and |SM
^{(1)}
<
α
,
β
>| denotes the proposed survivor metric (SM) of a survivor path in the first stage, which is generated by combining the
α
th smallest real part and the
β
th smallest imaginary part in (8) and (9). Furthermore, Re{SM
^{(i)}
} and Im{SM
^{(i)}
} denote the real and imaginary parts of the SM in the
i
th stage, respectively, where
i
≥ 2; ASM
_{i}
<
α
,
β
> (
x
^{(i)}
) is the proposed ASM of a survivor path in the
i
th stage, where
i
≥ 2, which is the descendant of the SM in the first stage, |SM
^{(1)}
<
α
,
β
>|. Assuming
N
_{t}
= 4, the QRM-MLD process based on the proposed ASM generations is illustrated in
Fig. 2
. First, separate real and imaginary parts in (8) and (9) are computed at the beginning of stage 1. After computing the real part in (8) for all the possible four (
e
_{1}
(
x
^{(1)}
)}
^{<2>}
= 2, Re{
e
_{1}
(
x
^{(1)}
)}
^{<1>}
= 1) are selected, which is expressed in (12). In the second part of stage 1 (imaginary), (13) is performed in a similar manner with (12). Then, the real and imaginary parts are combined (14) for generating the four smallest SMs of the first stage, where the results are |SM
^{(1)}
<2, 1>| = 3, |SM
^{(1)}
<2, 2>| = 5, |SM
^{(1)}
<1, 1>| = 2, and |SM
^{(1)}
<1, 2>| = 4. Here, only
M
addition operations are needed to generate
M
number of SMs by combining
M
number of SMs can be generated without any sorting operations for the ABMs. In the second stage, for each SM of the first stage, only the minimum real (Re{SM
^{(2)}
}) and minimum imaginary (Im{SM
^{(2)}
}) ABMs are selected among
M
= 4) ASMs (ASM
_{2}
<2, 1>(
x
^{(2)}
) = 6, ASM
_{2}
<2, 2>(
x
^{(2)}
) = 6, ASM
_{2}
<2, 2>(
x
^{(2)}
) = 8, and ASM
_{2}
<1, 1> (
x
^{(2)}
) = 5) are generated, and a similar process is repeated for each remaining stage (stages three and four). As a result, the minimum ASM, ASM
_{4}
<1, 1>(
x
^{(4)}
) (= 2 + 3 + 3 + 2 = 10), is selected as an optimal solution,
s
.
Example of proposed branch metric computations for case M = 4, n (Ω) = 16, and N _{t} = 4.
In terms of computational complexity, the proposed ASM generation–based 4 × 4 QRM-MLD is compared with a conventional 4 × 4 QRM-MLD
[6]
,
[7]
(see
Fig. 3
). Compared to
[6]
, the number of multiplications and additions with the L1-norm-based approaches (the proposed and
[7]
) are significantly reduced.
Figure 4
shows the BER comparisons between the proposed approach and the conventional breadth-first-search-based ML MIMO detectors
[6]
,
[7]
,
[9]
,
[10]
, with 64-QAM modulation using a fixed-point simulation. For all the plots, the input bit-width (real and imaginary parts of
z
and matrix
R
) is set to 15 bit, where integer and fractional parts are 6 bit and 9 bit, respectively. The results in
Fig. 4
show that the proposed architecture demonstrates a comparable BER with the conventional approach (with
K
= 10)
[7]
, when
M
= 16. For iso-BER comparison, if the proposed architecture (
M
= 16) is compared with the conventional one (with
K
= 10)
[7]
in terms of computational complexity, then the proposed approach shows 25% savings on the number of multiplications with increasing number of comparisons and additions. The BER performances for QPSK and 16-QAM are also presented in
Fig. 4
.
Computational complexity comparisons for 4 × 4 MIMO detectors.
BER comparisons among proposed approach and conventional breadth-first-search-based ML MIMO detectors [6] , [7] , [9] , [10] .
M
= 25) based on ASM generation with 4 × 4 MIMO multiplexing with 64-QAM constellation. A timing diagram of the proposed architecture is illustrated in
Fig. 6
. In stage 1 (
i
= 1), to compute Re{
e_{i}
(
x
^{(i)}
)} and Im{
e_{i}
(
x
^{(i)}
)} shown in (8) and (9), respectively, a constant multiplier (MUL_0), PED I, and PED II (PED_II_0) in
Fig. 5
are first operating. The outputs of MUL_0 are
r
_{4,4}
·
k
, where
k
is selected from the set {
a
,
b
,
c
,
d
}. Here,
a
,
b
,
c
, and
d
are four different magnitudes of the real and imaginary parts for 64-QAM symbols
[11]
. Since
b
,
c
, and
d
are multiples of constant
a
[11]
, the multiplier array (MA) can be simply implemented using adders and shifters, as presented in
Fig. 7(a)
. PED I performs the following
y
_{4}
−
r
_{4,4}
·
k
operations with eight real and eight imaginary cases, which is shown in
Fig. 7(b)
. Here, since the diagonal component of matrix
R
,
r
_{4,4}
, is a real number
[12]
, MUL_0, outputs (
r
_{4,4}
·
k
) can be shared for both of the real and imaginary parts in PED I. In PED_II_0, only muxes and absolute (ABS) value computation modules are operating in stage 1, as shown in
Fig. 7(c)
; the subtractor (SUB) modules bypass the eight real and eight imaginary PED I outputs. Using the eight real and eight imaginary outputs from PED_II_0, SORF_0 and SORF_1 inside the FMIN array sort five real and five imaginary outputs in ascending order. After the sorting operation is done in two clock cycles, the outputs of SORF_0 and SORF_1 are stored in the ordered path register. Finally, the separate part merging adders (SMA) generate 25 cases of |SM
^{(1)}
<
α
,
β
>| in (14) by adding all the combinations of five real (Re{
e
_{1}
(
x
^{(1)}
)}
^{<α>}
) and five imaginary (Im{
e
_{1}
(
x
^{(1)}
)}
^{<α>}
) parts, where
α
,
β
= 1, 2, … , 5.
Overall architecture of proposed QRM-MLD MIMO detector.
Timing diagram of proposed QRM-MLD MIMO detector.
Submodules of proposed QRM-MLD MIMO detector: (a) constant multiplier, (b) PED I, and (c) PED II module and turn-off pattern of subtractors (SUB), adders (ADD0, ADD1) in PED II.
In stage 2 (
i
= 2), three multipliers (MUL_0 to MUL_2), PED I, and all the PED II modules (PED_II_0 to PED_II_4) are operating to compute Re{SM(3)} and Im{SM(3)} according to (15) and (16), respectively. At first,
r
_{3,3}
·Re{
x
_{3}
} and
r
_{3,3}
·Im{
x
_{3}
} are calculated by MUL_0; Re{
r
_{3,4}
·
x
_{4}
} and Im{
r
_{3,4}
·
x
_{4}
} are computed by MUL_1 and MUL_2, respectively. Then, eight cases of Re{
y
_{3}
}−
r
_{3,3}
·Re{
x
_{3}
} and Im{
y
_{3}
}−
r
_{3,3}
·Im{
x
_{3}
} are computed in PED I, and the outputs of PED I and real/imaginary part generator are sent to five PED IIs. Here, real and imaginary components of
x
_{4}
are selected from the set {−a, −b, −c, −d, a, b, c, d} based on the value saved in the ordered path register, and five cases of Re{
r
_{3,4}
·
x
_{4}
} are distributed to each PED II. Similar processes are repeatedly performed in the PED II array by varying x4 to generate different Re{
r
_{3,4}
·
x
_{4}
} values for five clock cycles, as illustrated in the detailed timing diagram (
Fig. 6
).
In the adder array, branch metrics accumulation (BMA) units add the branch metrics of the previous and current stages for deciding the output vector of the final stage. The sorted branch metrics of stage 2 are stored in the path register to prepare stage 3.
In stage 3 (
i
= 3), the overall computation process is similar to that of stage 2, except for the operations in the MA specified in (11) and (12). In stage 4 (
i
= 4), all seven MUL modules (seven constant multipliers to compute
r
_{1,1}
·Re{
x
_{1}
}, Re{
r
_{1,2}
·
x
_{2}
}, Im{
r
_{1,2}
·
x
_{2}
}, Re{
r
_{1,3}
·
x
_{3}
}, Im{
r
_{1,3}
·
x
_{3}
}, Re{
r
_{1,4}
·
x
_{4}
}, and Im{
r
_{1,4}
·
x
_{4}
} in MA and the whole PED II are operating. Finally, the hard decision module is used to sort the ASMs in (17) for an output decision.
In the proposed QRM-MLD detector, the operating modules at each stage are specified in
Table 1
. The non-active gray-colored parts shown in
Fig. 7(c)
and
Fig. 8
can be simply turned off using the turning-off gate logic (TOGL) shown in
Fig. 9
[13]
. In the TOGL shown in
Fig. 9
, both of the pull-up (PMOS) and pull-down (NMOS) transistors are being used together. When
φ
is set to one, the pull-down NMOS transistor forces the outputs of the turned-off modules to zero to save dynamic power consumption and to correct functionality. The
φ
signals are generated from control logic to turn-off the non-active gray-colored parts in
Figs. 7(c)
and
8
.

SORF module in proposed architecture.
TOGL [11] applied to proposed architecture.
The proposed QRM-MLD-based 4 × 4 MIMO detectors are implemented using a Samsung 65 nm CMOS standard library. The power consumption is simulated with a gate-level netlist using Primetime-PX
[14]
with an operation frequency of 100 MHz, 1.2 V supply voltage. One hundred thousand random input test vectors of the matrix
R
and
Q
^{H}
y
are used for measuring power. According to the numerical results, the proposed QRM-MLD MIMO detector shows a maximum throughput of 288 Mbps with a normalized power efficiency of 10.18 Mbps/mW.
Table 2
shows the hardware comparisons of various 4 × 4 MIMO detectors in the literature. In the comparisons, the proposed architecture shows the lowest power (100 MHz, scaled to 65 nm) and smallest area among the other works. In terms of detection performance in
Fig. 4
, the proposed approach with
M
= 25 shows comparable or even better BER simulation results compared to the conventional
K
-best approach with
K
= 10
[7]
. Compared to
[15]
, the proposed approach with
M
= 25 shows slightly worse but comparable results in the BER simulation.

* For the power consumption with technology scaling, constant voltage scaling is assumed [16] . † Power consumption is estimated using the gate-level netlist simulation.
M
-value is proposed to further reduce the detector power consumption. Since the BER performance of a QRM-MLD MIMO detector is quite dependent on the selection of the
M
-value, the
M
-value is usually decided according to the worst-case channel. However, in most cases, the wireless channel conditions always fluctuate over time, and they are monitored and estimated in the communication system
[17]
. Based on this interesting observation, the proposed reconfigurable MIMO detector architecture can dynamically change the
M
-value depending on channel conditions to more aggressively reduce the power computation while satisfying BER performance requirements.
M
can be dynamically turned off to save computation energy. For example, when
M
is reduced to 16 (
x
_{4}
for both the real and imaginary parts at the end of stage 1. Since the Re{
r
_{3,4}
·
x
_{4}
} and Im{
r
_{3,4}
·
x
_{4}
} results are sent to four PED IIs in stage 2, only four paths are needed among five parallel paths. To turn off the unnecessary data path, the TOGL shown in
Fig. 9
is used at the input of the PED II array. As illustrated in
Fig. 10(b)
, while
M
is reduced to nine (
M
= 1 (
M
number of survivor metrics ASMi<
α
,
β
>(
x
^{(i)}
) for the
i
th stage, as shown in (17). The control logics like the enable signal generator and the reconfigurable scheduler are also implemented in the proposed reconfigurable QRM-MLD MIMO detector. The hardware area and power consumption comparison of the proposed reconfigurable architecture and the original architecture with fixed
M
(
Proposed reconfigurable QRM-MLD architecture: (a) block diagram of turning-off gating path and (b) turn-off pattern of turning-off gate with variable $\sqrt{M}$ -value.

Figure 11(a)
illustrates the BER performance comparison of the proposed QRM-MLD MIMO detection algorithm with variable
Experimental results of proposed variable M QRM-MLD architecture: (a) fixed-point BER performance with 64-QAM and (b) power consumptions (mW) and energy efficiencies with variable M (pJ/bit).
(a) $\sqrt{M}$ -value decision process of proposed reconfigurable QRM-MLD-based MIMO detector and (b) timing diagram of proposed reconfigurable MIMO detector.
M
-selection process is illustrated in
Fig. 12(b)
. A MIMO detection process is performed on the
N
data symbols (D
_{0}
, D
_{1}
, … , D
_{N−1}
). Since both operations are using preamble data, the SNR estimation
[18]
followed by M-selection operation (Latency:
T
_{SNR}
+
T_{M}
) can be processed in parallel with the first
H
-matrix estimation and QRD process
[19]
(Latency:
T
_{H,R}
=
T
_{H-matrix}
+
T
_{QRD}
), as presented in
Fig. 12(b)
. Here, for the seamless real-time operation of the proposed reconfigurable MIMO detector, the timing constraint of (18) has to be strictly satisfied, where
T
_{Frame duration}
is the preamble period and
N
_{H,R}
denotes the number of
H
-matrix estimations & QRD processes during a frame duration, as shown in
Fig. 12(b)
.
T
_{SNR estimation}
is generally larger than
T
_{H,R}
[20]
,
[21]
,
T
_{Preprocessing}
becomes
T
_{SNR estimation}
+
T
_{M decision}
. When
T
_{Frame duration}
is 5.0 ms following standard
[11]
, 3.0 ms
[22]
is enough to finish the MIMO detection process (
T
_{MIMO}
) of data symbols D
_{0}
, D
_{1}
, … , D
_{N−1}
, which include every unit symbol of a subchannel inside downlink durations. In this scenario,
T
_{SNR estimation}
+
T_{M}
equals 18.5 μs, while
T
_{H,R}
equals about 1.987 μs
[20]
,
[21]
;
N
_{H,R}
is estimated to be equal to four (= ⌊
T
_{Frame duration}
/
T_{C}
⌋), where the coherence time
T_{C}
is equal to 1.1 ms assuming the worst-case scenario in
[22]
. Since
T
_{Preprocessing}
is approximately 26.448 μs, the timing constraint (18) is easily satisfied. The constraint is still met when
T
_{Frame duration}
is around 2.5 ms
[11]
, since
T
_{MIMO}
reduces proportionately due to smaller number data symbols (D
_{0}
, D
_{1}
, … , D
_{N−1}
).
E
_{b}
/
N
_{0}
) with time is modeled by a normally distributed random variable with typical standard deviation
[23]
, and the detailed simulation specifications are as follows:
The proposed reconfigurable MIMO detector is simulated with an AWGN channel and Rayleigh fading channel, and the simulation results for the Rayleigh fading channel are presented in
Fig. 13
. In the proposed approach, the
^{−3}
. One of the advantages of the proposed approach is that a reasonable trade-off between power consumption and BER performance can be dynamically achieved while satisfying the predecided BER constraint.
Table 4
summarizes the power saving of the proposed architecture.
Simulation results of proposed and conventional QRM-MLD architectures for time-varying Rayleigh channel: (a) Rayleigh channel, (b) BER comparison, (c) $\sqrt{M}$ -value selection for varying time, and (d) power consumption with varying time.

M
is proposed to overcome the limitation of the conventional architecture with a fixed
M
. The proposed approach shows a reasonable trade-off between system performance and power savings with varying channel conditions. According to the experimental results from our implementation, the proposed reconfigurable MIMO detector achieves power savings of at least 32% with time-varying channel conditions while satisfying the BER performance requirement. The idea presented in this paper can assist in the design of MIMO detector algorithms and their implementation in low-power applications.
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2012R1A2A2A01012471 and No. 2011-0020128)
iputkurniawan@gmail.com
Iput Heri Kurniawan received his BS degree in electrical engineering and informatics from the School of Electrical Engineering and Information, Bandung Institute of Technology, Indonesia, in 2008 and his MS degree in electrical engineering from Korea University, Seoul, Rep. of Korea, in 2013. From 2008 to 2011, he was with PT, Xirka, Indonesia, as a research engineer in digital circuit design for OFDMA baseband. Since 2013 until now, he has been with Raon-Tech, Seongnam, Rep. of Korea, as a research engineer in digital circuit designs for digital front-ends, FM basebands, and haptic drivers. His research interests include low-power MIMO detectors.
improma@korea.ac.kr
Ji-Hwan Yoon received his BS degree in electrical engineering from Korea University, Seoul, Rep. of Korea, in 2009. He is currently pursuing his MS and PhD degrees with the Department of Electrical and Computer Engineering, Korea University. His interests include MIMO detector design, QR decomposition module design, and ultra-low-power system design.
jongkook@korea.ac.kr
Jong-Kook Kim received his BS degree in electronic engineering from Korea University, Seoul, Rep. of Korea, in 1998 and his MS and PhD degrees in electrical and computer engineering from the School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA, in 2000 and 2004, respectively. He is currently an associate professor at the School of Electrical Engineering, Korea University, where he joined in 2007. He was with Samsung SDS’s IT R&D Center, Seongnam, Rep. of Korea, from 2005 to 2007. His research interests include heterogeneous distributed computing, energy-aware computing, resource management, evolutionary heuristics, distributed mobile computing, neural networks, and distributed robot systems. He is a senior member of the IEEE and ACM.
Corresponding Author jongsun@korea.ac.kr
Jongsun Park received his BS degree in electronics engineering from Korea University, Seoul, Rep. of Korea, in 1998 and his MS and PhD degrees in electrical and computer engineering from Purdue University, West Lafayette, IN, USA, in 2000 and 2005, respectively. He joined the Electrical Engineering Faculty of Korea University, in 2008. From 2005 to 2008, he was with the Signal Processing Technology Group, Marvell Semiconductor Inc., Santa Clara, CA, USA. He was also with the Digital Radio Processor System Design Group, Texas Instruments, Dallas, USA, in the summer of 2002. His research interests focus on variation-tolerant, low-power, and high-performance VLSI architectures, and circuit designs for digital signal processing and digital communications.

I. Introduction

Due to higher spectral efficiency with improved link reliability
[1]
, spatial multiplexing multiple-input and multiple-output (MIMO) technology has been widely adopted in wireless standards such as IEEE 802.11n, IEEE 802.16e, IEEE 802.16m, and Long-Term Evolution.
In the MIMO detector, maximum likelihood (ML) decoding
[2]
has been considered as one of the best solutions for spatial multiplexing, and it is depth-first approaches such as sphere decoding that show near-optimal performances
[3]
. However, since a sphere decoding approach involves iteratively checking all possible selections of a detected vector, the complexity of the worst-case scenario exponentially increases with an increasing MIMO channel-matrix size, which results in huge hardware overhead with irregular throughput
[4]
. For low-complexity ML detection, breadth-first approaches, such as
II. QRM-MLD MIMO Detection

- 1. MIMO System Model

Let us consider a spatial multiplexing MIMO system with
(1) y=Hx+n,

where
(2) s=arg min x∈ Ω N t ‖ y−Hx ‖ 2 ,

where Ω is the set of all complex elements in the constellation and Ω
- 2. Conventional QRM-MLD Approach

To transform the optimization problem of (2) into a tree-search problem, a QR decomposition (QRD) of the channel response matrix
(3) H=QR,

where
(4) z= Q H y=Rx+w,

where
(5) ‖ z−Rx ‖ 2 = ∑ i=1 N t | z N t +1−i − ∑ j=1 i r N t +1−i, N t +1−j ⋅ x N t +1−j | 2 ,

where
(6) PED i ( x (i) )= PED i –1 ( x (i–1) ) + | e i ( x (i) ) | 2 ,

(7) e i ( x (i) )= z N t +1−i − ∑ j= N t +1−i N t r N t +1−i ⋅ x j .

Here, PED
x (i) =[ x N t −i+1 , x N t −i+2 , … , x N t ]

denotes a partial vector symbol in the
PPT Slide

Lager Image

III. Proposed QRM-MLD MIMO Detection Architecture Based on Branch Metric Computation

In this section, an approximate survivor metric (ASM) generation approach is presented, which can be effectively used to reduce the computational complexity of the QRM-MLD MIMO detector. The hardware implementation results of the proposed ASM-based MIMO detector are also presented to show the power consumption reduction and silicon area.
- 1. ASM Generation

The basic idea of the proposed ASM generation is that the absolute values of the real and imaginary parts can be separately computed and sorted to find survival branches. In the following, (8) and (9) denote the magnitudes of the real and imaginary parts of the approximate branch metric based on the simplified norm algorithm
[5]
, respectively:
(8) Re{ e i ( x (i) ) }=| Re{ z N t +1−i }−Re{ ∑ j= N t +1−i N t r N t +1−i ⋅ x j } |,

(9) Im{ e i ( x (i) ) }=| Im{ z N t +1−i }−Im{ ∑ j= N t +1−i N t r N t +1−i ⋅ x j } |,

(10) | e i ( x (i) ) |= Re{ e i ( x (i) ) } + Im{ e i ( x (i) ) },

(11) PED i ( x (i) )= PED i –1 ( x (i–1) ) + | e i ( x (i) ) |,

where PED
M

real and
M

imaginary parts in (8) and (9) are calculated. They are then sorted in ascending order. After this, the possible candidates for the
M

minimum real and
M

minimum imaginary parts of results in (8) and (9). In the proposed approach, only
M

minimum real and
M

minimum imaginary parts in (8) and (9) are selected from the sorted
M

real and
M

imaginary parts in (8) and (9). Then,
M

minimum real and
M

minimum imaginary parts in (8) and (9). In the first stage of the proposed ASM generation process, by using (12) and (13), only the
M

smallest real and imaginary parts are searched from
M

real and
M

imaginary parts in (8) and (9), respectively.
(12) Re { e 1 ( x (1) ) } 〈 l 〉 = lthmin x N t ∈C | Re{ z N t − r N t , N t ⋅ x N t } | ,

(13) Im { e 1 ( x (1) ) } 〈 l 〉 = lth min x N t ∈C | Im{ z N t − r N t , N t ⋅ x N t } |,

(14) | SM (1) <α,β> |= Re { e 1 ( x (1) ) } <α> + Im { e 1 ( x (1) ) } <β> ,

(15) Re{ SM (i) }= min x i ∈C | Re{ e i ( x (i) ) } | ,

(16) Im{ SM (i) }= min x i ∈C | Im{ e i ( x (i) ) } |,

(17) ASM i 〈 α,β 〉( x (i) ) =| SM ( 1 ) 〈 α,β 〉 | + ∑ i=2 N t | Re{ SM ( i ) } + Im{ SM (i) } |,

where
M

,
lth min x N t ∈C | Re { z N t − r N t , N t ⋅ x N t } |

and
lth min x N t ∈C | Im { z N t − r N t , N t ⋅ x N t } |

denote the
M

real and
M

imaginary parts in the first stage, respectively;
− n(Ω) +1,

− n(Ω) +3,

… , −1, … ,
− n(Ω)

−1}, where
− n(Ω)

is the square root of an arbitrary constellation of order
n(Ω)

= 4) cases, two survival branches
( M =2)

with the smallest partial (real) SMs (Re{
M

real and
M

imaginary parts. Please note that
n(Ω)

children, which is presented in (15) and (16) for real and imaginary parts of ABMs, respectively. After the second stage, four (
PPT Slide

Lager Image

PPT Slide

Lager Image

PPT Slide

Lager Image

- 2. QRM-MLD Architecture Based on ASM Generation

Figure 5
illustrates the proposed QRM-MLD MIMO detector architecture (
PPT Slide

Lager Image

PPT Slide

Lager Image

PPT Slide

Lager Image

Specifications of active modules at each stage in proposed MIMO detector.

Stage | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|

MA | MUL_0 | Active | |||

MUL_1 | - | Active | |||

MUL_2 | - | Active | |||

MUL_3 | - | Active | |||

MUL_4 | - | Active | |||

MUL_5 | - | Active | |||

MUL_6 | - | Active | |||

PED II array (index) | 0 | 0, 1, 2, 3, 4 | |||

FMIN array | SORF (mode) | Sorter | FMIN | ||

SFMIN (index) | - | 0, 1, 2, 3, 4, 5, 6, 7 | |||

Adder array | All SMA | All SMA and BMA | |||

Hard decision | - | Active |

PPT Slide

Lager Image

PPT Slide

Lager Image

Hardware comparison of 4 × 4 MIMO detectors.

Architecture | This work | |||||
---|---|---|---|---|---|---|

Process (nm) | 65 | 130 | 130 | 65 | 130 | 65 |

Algorithm | Sphere decoding | Modified | QRM-MLD | |||

Modulation | 64 | 64 | 64 | 64 | 64 | 64 |

64 | 10 | 64 | 12 | 10 | 25 | |

Area (gate count) | 1,760K | 114K | 280K | 320K | 340K | 103K |

Oper. freq./scaled to 65 nm (MHz) | 158 | 282/564 | 270/540 | 641 | 417/834 | 278 |

Power @Oper. freq. (mW) | 165 | 135 | 94 | 165 | 1,700 | 28.3 |

Power | 104.4^{†} | 47.9/23.9 | 34.8/17.4 | 25.74 | 407.7/203.9 | 10.177^{†} |

Max. throughput/scaled to 65 nm (Mbps) | 100 | 675/1,350 | 8.57/17.14 | 1282 | 1,000/2,000 | 288 |

IV. Low-Power Channel-Adaptive Reconfigurable QRM-MLD MIMO Detector

In this section, a low-power channel-adaptive QRM-MLD MIMO detector with reconfigurable
- 1. Hardware Architecture of Proposed Reconfigurable QRM-MLD-Based MIMO Detector

In the QRM-MLD MIMO detector architecture presented in the previous section, all of the modules in the PED II array, FMIN array, and Adder array are operating in parallel, as displayed in
Fig. 6
. When
M

= 5, the reconfigurable architecture of
Fig. 10(a)
is operating in the same way as the one shown in
Fig. 5
. When the
M

-value becomes smaller (4, 3, 2, 1), some parts of the detector are not needed, and those unnecessary parts with smaller
M

= 4), the computation process of stage 1 is the same with the case of
M

= 5. However, only four elements from the set {−a, −b, −c, −d, a, b, c, d} are selected as the survivor metrics
M

= 5, all the data paths are operating. When
M

= 3), only three paths are working, and the other two data paths are turned off using TOGL3 and TOGL4. Finally, when
M

= 1), only one path of hardware (TOGL0) is operating.
As shown in the proposed reconfigurable MIMO detector, once the
M

-value is decided, the corresponding number of PEDs are operating according to
Fig. 10(a)
. Here, the proposed reconfigurable QRM-MLD MIMO detector is operating by the same pattern with
Fig. 6
, which means that the architecture shows a fixed throughput regardless of the
M

-value. In the
M

number of PEDs,
M

number of real and imaginary ABMs are simultaneously working, and the results are combined to generate
M

= 5) are presented in
Table 3
. According to our simulation results with reconfigurable MIMO detector, the area overhead due to the proposed turning-off scheme and control logics is 9.82% (11,000 gate count), and power consumption overhead is 13.95% (4.5 mW) compared to the original design without turning-off modules.
PPT Slide

Lager Image

Comparison between fixed and reconfigurable architectures.

Architecture | Process (nm) | Area (gate count) | Power consumption (mW) @ Max. frequency (278 MHz) | |
---|---|---|---|---|

Reconfigurable ( | 65 | 114K | ( | 32.80 |

( | 27.25 | |||

( | 21.85 | |||

( | 16.99 | |||

( | 12.37 |

- 2. Numerical Result of Proposed Reconfigurable QRM-MLD-Based MIMO Detector

The BER performance of the proposed reconfigurable 4 × 4, 64-QAM QRM-MLD-based MIMO detector is estimated with the following simulation setup:
- QRD for the QRM-MLD-based MIMO detection is based on the Gram–Schmidt algorithm[12].
- The minimum accumulated branch metric is decided based on a hard decision algorithm.
- As a noise model, additive white Gaussian complex random noise is used.

M

. Since the proposed MIMO detector shows a wide range of BER performance with varying
M

-value, the proposed architecture can be efficiently adapted under varying channel conditions.
Figure 11(b)
also shows the power consumption and energy efficiency (pJ/bit) of the proposed reconfigurable detector with different
M

-value at the maximum operating frequency of 278 MHz. As shown in the figure, the power savings range from 17% to 62% when
M

changes.
PPT Slide

Lager Image

V. Experimental Results under Time-Varying Channel Conditions

In this section, the proposed reconfigurable QRM-MLD-based MIMO detector can be effectively used to save power while maintaining the required BER performance under two varying channel conditions. The following gives a detailed description.
- 1.M-Value Decision Process

Figure 12(a)
illustrates the
M

-value decision process. The
M

-value decision process under varying channel conditions is displayed in the dotted box. Initially, the SNR of the target channel is estimated. To adaptively update the
M

-value according to the estimated SNR, the
M

-value generator first locates the SNR range. Based on the located SNR range, the best
M

-value is decided using a
M

mapping table. The SNR to
M

mapping table is made off-line.
PPT Slide

Lager Image

- 2. SNR Estimation Based onM-Selection Process in Reconfigurable MIMO Detection

For the proposed reconfigurable MIMO detector to be efficiently used in the communication system with time-varying channel, a timing diagram including the
(18) T Frame duration ≥ T Preprocessing + T MIMO ,

(19) T Preprocessing ≥ max( T H ,R , T SNR + T M ) + ( N H ,R – 1 ) T H ,R .

In (19), since
- 3. Experimental Results

In this section, the dynamic reconfiguration of the proposed MIMO detector is demonstrated to trade off BER performance and power savings with an arbitrarily varying channel. The channel condition (
- ■ For a 10 MHz channel bandwidth, the number of occupied subcarriers is 1,024[11].
- ■ The frame duration time is set to 5 ms[11]and the simulation time is 3 s, which means the simulation covers 600 frames in total.
- ■ The frame start preamble used to obtain the average SNR estimate consists of a 32-symbol sequence generated by repeating a 16-symbol CAZAC sequence[11].
- ■ The acceptable lower bound of the constant BER value is decided to be 10−3[24]for the whole range of theEb/N0in every frame duration.
- ■ According to the average SNR estimate, theM-value is dynamically decided as the lowestM-value that can satisfy the BER constant (10−3).

M

-value can be adaptively changing according to the channel conditions, which results in significant power consumption reduction, as shown in
Fig. 13(d)
. Although the BER performance can be degraded due to the modified
M

-value, as illustrated in
Fig. 13(b)
, it always satisfies the BER constraint of 10
PPT Slide

Lager Image

Power savings compared to conventional fixedM-based architecture.

Modeled channel | Power saving (%) |
---|---|

AWGN channel | 35% |

Rayleigh fading channel with four numbers of fading and 6.4 Hz of Doppler frequency | 32% |

VI. Conclusion

In this paper, a low-power channel-adaptive reconfigurable QRM-MLD MIMO detector architecture is presented. The power optimization of the proposed MIMO detector is achieved by two novel approaches. First, an ASM generation is proposed by separating real and imaginary parts of the branch metric to reduce the computational complexity with a minor BER performance degradation. To further reduce the power consumption of the proposed ASM-based MIMO detector under time-varying channel conditions, second, a reconfigurable QRM-MLD approach with variable
BIO

Huang C.
,
Yu C.
,
Ma H.
2009
“A Power-Efficient Configurable Low-Complexity MIMO Detector,”
IEEE Trans. Circuits Syst. I: Reg. Papers
56
(2)
485 -
496
** DOI : 10.1109/TCSI.2008.2001368**

Agrell E.
2002
“Closest Point Search in Lattices,”
IEEE Trans. Inf. Theory
48
(8)
2201 -
2214
** DOI : 10.1109/TIT.2002.800499**

Mondal S.
2010
“Design and Implementation of a Sort-Free K-Best Sphere Decoder,”
IEEE Trans. Very Large Scale Integr. Syst.
18
(10)
1497 -
1501
** DOI : 10.1109/TVLSI.2009.2025168**

Dai Y.
,
Sun S.
,
Lei Z.
2005
“A Comparative Study of QRD-M Detection and Sphere Decoding for MIMO-OFDM Systems,”
IEEE Int. Symp. PIMRC
Berlin, Germany
186 -
190

Wenk M.
“K-Best MIMO Detection VLSI Architectures Achieving up to 424 Mbps,”
IEEE Int. Symp. Circuits Syst., Island of Kos
Greece
May 21–24, 2006
1151 -
1154

Kim K.
2005
“A QRD-M/Kalman Filter-Based Detection and Channel Estimation Algorithm for MIMO-OFDM Systems,”
IEEE Trans. Wireless Commun.
4
(2)
710 -
721
** DOI : 10.1109/TWC.2004.842951**

Shabany M.
,
Gulak P.
2012
“A 675 Mbps, 4×4 64-QAM K-Best MIMO Detector in 0.13 μm CMOS,”
IEEE Trans. Very Large Scale Integr. Syst.
20
(1)
135 -
147
** DOI : 10.1109/TVLSI.2010.2090367**

Chen S.
,
Zhang T.
,
Xin Y.
2007
“Relaxed K-Best MIMO Signal Detector Design and VLSI Implementation,”
IEEE Trans. Very Large Scale Integr. Syst.
15
(3)
328 -
337
** DOI : 10.1109/TVLSI.2007.893621**

Khairy M.S.
2014
“Algorithms and Architectures of Energy-Efficient Error-Resilient MIMO Detectors for Memory-Dominated Wireless Communication Systems,”
IEEE Trans. Circuits Syst. I: Reg. Papers
61
(7)
2159 -
2171
** DOI : 10.1109/TCSI.2014.2298273**

Huang M.Y.
,
Tsai P.Y.
2014
“Toward Multi-Gigabit Wireless: Design of High-Throughput MIMO Detectors with Hardware-Efficient Architecture,”
IEEE Trans. Circuits Syst. I: Reg. Papers
61
(2)
613 -
624
** DOI : 10.1109/TCSI.2013.2284189**

2012
IEEE Standard for Air Interface for Broadband Wireless Access Systems, IEEE Standard 97266
New York, NY, USA

Singh C.K.
,
Sushma H.P.
,
Balsara P.T.
“VLSI Architecture for Matrix Inversion Using Modified Gram-Schmidt Based QR Decomposition,”
IEEE Int. Conf. VLSI Des.
Bangalore, India
Jan. 6–10, 2007
836 -
841

Park J.
,
Choi J.
,
Roy K.
2010
“Dynamic Bit-Width Adaptation in DCT: An Approach to Trade off Image Quality and Computation Energy,”
IEEE Trans. VLSI Syst.
18
787 -
793
** DOI : 10.1109/TVLSI.2009.2016839**

Synopsys PrimeTime User’s Manual
http://www.synopsys.com

Mahdavi M.
,
Shabany M.
2013
“Novel MIMO Detection Algorithm for High-Order Constellations in the Complex Domain,”
IEEE Trans. VLSI Syst.
21
834 -
847
** DOI : 10.1109/TVLSI.2012.2196296**

Borkar S.
1999
“Design Challenges of Technology Scaling,”
IEEE Micro
19
(4)
23 -
29

Proakis J.G.
2000
“Digital Communication,”
4th ed.
McGraw-Hill
New York, USA

Zivkovic M.
,
Mathar R.
“An Improved Preamble-Based SNR Estimation Algorithm for OFDM Systems,”
IEEE Int. Symp. PIMRC
Istanbul, Turkey
Sept. 26–29, 2010
172 -
176

Burg A.
2006
“Algorithm and VLSI Architecture for Linear MMSE Detection in MIMO-OFDM Systems,”
IEEE Int. Symp. Circuits Syst.
Island of Kos, Greece
21 -
24

Darji A.D.
,
Patil M.S.
“VLSI Implementation of Balanced Binary Tree Decomposition Based 2048-Point FFT/IFFT Processor for Mobile WI-Max,”
Int. Conf. Emerg. Trends Eng. Technol.
Goa, India
Nov. 19–21, 2010
745 -
748

Cheng S.
,
Evans J.B.
1997
“Implementation of Signal Power Estimation Methods,”
IEEE Trans. Circuits Syst. II: Analog Digit. Signal Process.
44
(3)
240 -
250
** DOI : 10.1109/82.558458**

2011
WiMAX Forum Proprietary, Requirements for WiMAX Coexistence with LTE Network, WMF-T31-132-v01
Clackamas

Zaidi Z.R.
,
Mark B.L.
2005
“Real-Time Mobility Tracking Algorithms for Cellular Networks Based on Kalman Filtering,”
IEEE Trans. Mobile Comput.
4
(2)
195 -
208
** DOI : 10.1109/TMC.2005.29**

Kermoal J.P.
2002
“A Stochastic MIMO Radio Channel Model with Experimental Validation,”
IEEE J. Sel. Areas Commun.
20
(6)
1211 -
1226
** DOI : 10.1109/JSAC.2002.801223**

2002
MIMO Rapporteur: 3GPP TSG R1-02-0141

Citing 'Low-Power Channel-Adaptive Reconfigurable 4×4 QRM-MLD MIMO Detector
'

@article{ HJTODO_2016_v38n1_100}
,title={Low-Power Channel-Adaptive Reconfigurable 4×4 QRM-MLD MIMO Detector}
,volume={1}
, url={http://dx.doi.org/10.4218/etrij.16.0115.0176}, DOI={10.4218/etrij.16.0115.0176}
, number= {1}
, journal={ETRI Journal}
, publisher={Electronics and Telecommunications Research Institute}
, author={Kurniawan, Iput Heri
and
Yoon, Ji-Hwan
and
Kim, Jong-Kook
and
Park, Jongsun}
, year={2016}
, month={Feb}