Adaptive Techniques for Joint Optimization of XTC and DFE Loop Gain in High-Speed I/O

ETRI Journal.
2015.
Oct,
37(5):
906-916

- Received : March 11, 2014
- Accepted : May 20, 2015
- Published : October 01, 2015

Download

PDF

e-PUB

PubReader

PPT

Export by style

Share

Article

Metrics

Cited by

TagCloud

High-speed I/O channels require adaptive techniques to optimize the settings for filter tap weights at decision feedback equalization (DFE) read channels to compensate for channel inter-symbol interference (ISI) and crosstalk from multiple adjacent channels. Both ISI and crosstalk tend to vary with channel length, process, and temperature variations. Individually optimizing parameters such as those just mentioned leads to suboptimal solutions. We propose a joint optimization technique for crosstalk cancellation (XTC) at DFE to compensate for both ISI and XTC in high-speed I/O channels. The technique is used to compensate for between 15.7 dB and 19.7 dB of channel loss combined with a variety of crosstalk strengths from 60 mV
_{p-p}
to 180 mV
_{p-p}
adaptively, where the transmit non-return-to-zero signal amplitude is a constant 500 mV
_{p-p}
.
crosstalk strength
is a strong function of channel spacing.
Figure 1
shows simulated eye degradation with various channel spacing. When the spacing is extremely small, the eye is completely closed, as shown in the third row in
Fig. 1
. The maximum crosstalk amplitude occurs at the timing of data transition in an adjacent channel. The detection of maximum crosstalk amplitude and maximum data amplitude can be independently achieved, and adaptive loops of XTC and DFE can operate separately.
Simulated eye degradation for various crosstalk strengths from channel spacing deviation.
Figure 2
shows an eye diagram of a non-return-to-zero (NRZ) signal with crosstalk and a zoomed-in version of the transition timing (upper-right inset). We note that there are two types of crosstalk that couple onto the data transition; that is, positive and negative. If we place a data slicer (one that is triggered by a recovered clock) at the middle of a transition, then the resulting digital values can represent positive or negative impact due to crosstalk depending on the type of data transition of the adjacent forward signal. The logical relations that result in either positive or negative crosstalk impact are described and summarized in the table at the bottom RHS of
Fig. 2
(Cases I–IV). Regardless of the data transition type of the forward signal, the polarity of crosstalk impact relies only on the type of data transition from the adjacent channel.
Positive and negative crosstalk impact on data transition of forward signal.
Proposed adaptive XTC algorithm (equalizer blocks are simplified).
Since the ISI equalizer cannot remove the crosstalk, it causes a timing jitter. A DFE can mitigate any occurring ISI without increasing crosstalk noise, but any jitter via the crosstalk remains uninfluenced and reduces the horizontal eye margin.
Assuming that the recovered 0° clock provides a rising-edge timing at the data eye center for the differential data slicer, the differential edge slicer is triggered by a 180° clock and samples the digital signal generated at the data edge,
x
_{1}
[
t
_{0.5}
]. If the detected differential data signal at any time during the period of the rising edge of the 180° clock is larger than 0 (crossing point), which implies a positive crosstalk impact, then the “edge slicer” and “CML-to-CMOS blocks end up having a value of “1”; and vice versa. The “digital delay” block that follows holds
x
_{1}
[
t
_{0.5}
] with a half UI, and all of
x
_{1}
[
t
_{0}
],
x
_{1}
[
t
_{0.5}
], and
x
_{1}
[
t
_{1}
] are sent to the “combinational logic” block concurrently for judging an over or under XTC compensation.
Using
x
_{1}
[
t
_{0}
] and
x
_{1}
[
t
_{1}
] from channel 1 and
x
_{2}
[
t
_{0}
] and
x
_{2}
[
t
_{1}
] from channel 2, we can predict if the data transitions will contain either positive or negative crosstalk, as explained in
Fig. 2
. If the detected digital signal,
x
_{1}
[
t
_{0.5}
] or
x
_{2}
[
t
_{0.5}
], is identical to the predicted digital signal, then the XTC is underperforming and the “combinational logic” block then generates an “UP” to increase any XTC signal gain; and vice versa. Based on the preceding logical hypothesis and the table in
Fig. 2
,
Table 1
is created. Through an adaptive XTC loop, the edge sample is forced to have zero mean.

The bottom half of
Fig. 4
describes the low-frequency feedback loop for an adaptive XTC. The “detection” and “combinational logic” blocks generate “UP” or “DN” pulses, and these pulses are integrated by a charge pump to update the control voltage for XTC gain. The loop gain in the frequency domain can be express as
(1) $$\text{Loop\hspace{0.17em}gain}=0.25\frac{A{I}_{S}}{sC},$$
where
I_{s}
is the current of the charge pump,
C
is an integration capacitance (see
Fig. 4
),
A
is an AGC gain (this is set to a unity gain for simplicity reasons), and 0.25 is a necessary factor because the transitions in both channel 1 and channel 2 occur at the same time with a probability of 1/4. In reality, the factor varies depending on the patterns in channel 1 and channel 2. Since there is a single pole at zero-frequency in the whole feedback-loop transfer function, the loop is stable with a phase margin of no less than 90.
Simulated adaptation results of control voltage for XTC gain.
The updating voltage step (Δ
V
) in
V
_{CONT}
at each symbol duration is
AI_{s}T_{b}
/
C
. For the signal in the third row in
Fig. 1
, we have simulated an adaptive XTC. In our 12 Gb/s application, we set
T_{b}
as 83.3 ps. The other parameters are
A
= 1,
I_{s}
= 50 μA, and
C
= 1 pF. The difference of a control voltage from an initial value to an optimal converging value is 757 mV. Based on these settings, the theoretical converging time of the loop can be calculated as
(2) $$\text{Convergence\hspace{0.17em}time}=\frac{757\text{\hspace{0.17em}mV}\cdot {T}_{b}}{0.25\cdot \text{\Delta}V}=60.1\hspace{0.17em}\text{ns}.$$
Compared to our simulated value, 62 ns, it shows a 3.2% error because of the randomness of the data pattern. By increasing the current of the charge pump or reducing the size of the capacitor, the updating voltage step can increase to reduce the converging time. However, the large static state noise of the control voltage due to the large step size directly translates to more residual crosstalk noise after XTC.
In practice, there will be an expected channel spacing and initial XTC gain for each PCB channel product.
Figure 5
shows the high-speed signal adder circuit, and
Fig. 6
shows the simulation results of our adaptive XTC for various crosstalk strengths that are inversely proportional to channel spacings. The circuit on the top RHS of
Fig. 5
shows an implementation of the adder block with analog control. We control the ratio of dc current in the forward path amplifier and XTC path amplifier. The trans-conductance of the two paths share a common load, and the addition gain can be varied depending on the
g_{m}
ratio. Since the dc current at the load is constant regardless of current ratios (that is, the gain as well), the dc values at the output nodes remain constant so that ac coupling capacitors can be avoided in the next stage. We can implement this block by using DAC-based current source switching, as shown in the bottom picture of
Fig. 5
. The circuit shows a 3-bit control but can easily be extended for high resolution control.
Ratio control in high-speed XTC adder: current-based g_{m} control (top) and DAC-based binary g_{m} control circuit (bottom).
The range of ‘
V
_{CONT}
’ is from 0 V to 1 V, and the overall gain of the adder,
G
, is set to a value of four in our simulation. If ‘
V
_{CONT}
’ is increased, then a larger XTC signal is added while the forward signal with the crosstalk signal decreases. To cope with various input signal levels, this adder (along with the gain control block) precedes the XTC and DFE blocks. As shown by the simulation results in
Fig. 6
,
V
_{CONT}
converges to large values for large crosstalk values; it just takes a little longer to do so. The large settling times for large crosstalk values are due to the constant slope of settling.
Throughout this section, we have avoided any consideration of a MIMO signal (shown in
[1]
) in our analysis of the adaptive XTC loop. In fact, a MIMO signal does not affect the decision of an edge slicer. This is because a MIMO signal is a derivative of a crosstalk signal. At the timing of the maximum crosstalk amplitude during a data transition when the edge slicer decides a positive or negative crosstalk impact, the theoretical MIMO signal value is zero; consequently, the decision of the edge slicer is unaffected. For larger crosstalk strengths, the gains of the forward signals after XTC adaptation tend to become smaller, as shown in
Fig. 6
. The AGC block can adjust these various forward-signal strengths into constant signal amplitude. The integration of both adaptive XTC and adaptive DFE will be presented in the following section.
Simulated XTC adder adaptation results for different crosstalk levels, impact on eye openings, and final V _{CONT} values: (a) eye convergence at XTC adder output and (b) V _{CONT} voltage adaptation
edge
at a maximum pulse response amplitude (cursor timing). A single symbol-duration pulse at the transmitter produces a pulse response that contains ISI tails upon passing through a dispersive channel, as shown in
Fig. 7(a)
. The RHS picture in
Fig. 7(a)
shows our algorithm for the adaptive DFE in the discrete domain. In our simulation, we assume two ISI post taps (
h
_{1}
,
h
_{2}
). The goal of the adaptive DFE block is to generate an ISI-free signal with a constant amplitude (
B
) on the node before a slicer, where the signal is indicated as
z
[
k
]. In
Fig. 7
, the slicer generates a digital output — “1” or “−1” depending on the input.
(a) Discrete-time model of DFE adaptive loop and (b) convergence of AGC gain and DFE coefficients (LMS (left) and sign-sign LMS (right)).
The received signal,
r
[
k
], is a convolution of a transmitted signal sequence,
x
[
k
], and a pulse response of the channel, (
h
[0],
h
[1], and
h
[2]). This signal is amplified by a gain control value,
A
[
k
], and equalized by the DFE loop. The equalized signal,
z
[
k
], is expressed as
A
[
k
]
r
[
k
] −
c
_{1}
[
k
]
x
[
k
− 1] −
c
_{2}
[
k
]
x
[
k
− 2], where
x
[
k
− 1] and
x
[
k
− 2] are the slicer’s digital output at times
k
− 1 and
k
− 2, respectively. All parameters bar
x
[
k
] and
x
[
k
] are discrete-time values and can be any real number. Here,
r
[
k
] and
z
[
k
] are differential signals. For the adaptation of the adaptive DFE, we use a least mean square (LMS) algorithm. As the gain control and equalization proceeds iteratively, the adaptive variables,
A
[
k
],
c
_{1}
[
k
], and
c
_{2}
[
k
], converge to their respective optimal values; the expectation of the error signal, E(
e
^{2}
[
k
]), reaches a minimum value. The aforementioned error signal can be expressed as
(3) $$\begin{array}{cc}e[k]\hfill & =z[k]-B\text{x}[k]\hfill \\ \hfill & =A[k]r[k]-{c}_{1}[k]\text{x}[k-1]-{c}_{2}\text{x}[k-2]-B\text{x}[k].\hfill \end{array}$$
The principle that lies behind the LMS algorithm is based on the gradient descent. At every symbol period, each adaptive variable is updated with a negative partial derivative of each of the variables
A
[
k
],
c
_{1}
[
k
], and
c
_{2}
[
k
] with respect to the error signal
e
^{2}
[
k
], which represents a gradient of the function E(
e
^{2}
[
k
]) in a space constructed by
A
[
k
],
c
_{1}
[
k
], and
c
_{2}
[
k
] (that is, E(
e
^{2}
[
k
]) = func(
A
[
k
],
c
_{1}
[
k
],
c
_{2}
[
k
]));
(4) $$A[k+1]=A[k]-\mu \frac{\partial {e}^{2}[k]}{\partial A[k]}=A[k]-2\mu \cdot r[k]e[k],$$
(5) $${c}_{1}[k+1]={c}_{1}[k]-\mu \frac{\partial {e}^{2}[k]}{\partial {c}_{1}[k]}={c}_{1}[k]+2\mu \cdot \text{x}[k-1]e[k],$$
(6) $${c}_{2}[k+1]={c}_{2}[k]-\mu \frac{\partial {e}^{2}[k]}{\partial {c}_{2}[k]}={c}_{2}[k]+2\mu \cdot \text{x}[k-2]e[k]$$
is an updating speed. For large values of
μ
, any resulting settling times can be reduced; however, the stability of the DFE loop will be degraded. A typical value for
μ
is 0.05. After sufficient iterations, the amplitude of
e
[
k
] approaches zero and all adaptive variables converge. In simulations, this LMS algorithm can be proven to be valid, as shown in
Fig. 7(b)
. However, the multiplications in equations (4)–(6) are power costly in terms of hardware implementation. The most significant features of each iteration of the LMS algorithm are the signs of the gradients of the adaptive variables and the sign of the error signal. As long as the gradient of each adaptive variable is heading in the correct direction, then the adaptive variables will eventually reach their respective optimal values (where the error signal
e
^{2}
[
k
] approaches its minimum value). This modified version of LMS is called sign-sign LMS and is easily implementable in hardware.
(7) $$A[k+1]=A[k]-2\mu \cdot \text{sign}(\text{x}[k])\text{sign}(e[k]),\text{\hspace{0.17em}\hspace{0.17em}}$$
(8) $${c}_{1}[k+1]={c}_{1}[k]+2\mu \cdot \text{sign}(\text{x}[k-1])\text{sign}(e[k]),$$
(9) $${c}_{2}[k+1]={c}_{2}[k]+2\mu \cdot \text{sign}(\text{x}[k-2])\text{sign}(e[k]).$$
In (4),
r
[
k
] is replaced by
x
[
k
] in (7) because a latency from
r
[
k
] to
e
[
k
] in an actual implementation becomes problematic. When
A
[
k
],
c
_{1}
[
k
], and
c
_{2}
[
k
] converge to a finite value after sufficient iterations, the amplitude of
e
[
k
] approaches zero and the values of sign(
x
[
k
]), sign(
e
[
k
]), sign(
x
[
k
− 1])sign(
e
[
k
]), and sign(
x
[
k
− 2])sign(
e
[
k
]) toggle between “1” and “−1” with an equal number of occurrences. Because
μ
is a small value, the adaptive variables do not vary significantly and finally converge.
Figure 7(b)
presents converging simulations of the adaptive coefficients of both the LMS and the sign-sign LMS algorithm (in Matlab). In these simulations, the cursor, first-tap ISI, and second-tap ISI (
h
_{0}
,
h
_{1}
, and
h
_{2}
, respectively) are set to 500 mV, 200 mV, and 100 mV, respectively. The pulse response is convoluted with a data sequence
x
[
k
] of a digital value, 1 or −1. The target amplitude of the equalized signal
z
[
k
] is 500 mV
_{dp−p}
(
B
= 250 mV in
Fig. 7(a)
). The AGC gain,
A
[
k
], converges to 0.5 to meet the target amplitude, and the DFE tap coefficients become half of that of the ISI taps due to the reduced AGC gain.
When implementing the algorithm to hardware, multiplying
B
with
x
[
k
] and comparing it with
z
[
k
] to generate sign(
e
[
k
]) at symbol rate is power hungry due to the speed.
Figure 8
shows a circuit implementation that can circumvent this problem. The equalized signal
z_{p}
[
k
] is a positive pair of a differential signal
z
[
k
], and we compare it with a reference voltage (
B
/2 or −
B
/2) by using a differential slicer. In a combinational logic, we can make sign(
e
[
k
]) in (7), (8), and (9), which could be “−1” or “1,” to be represented by an implementable digital value of “0” or “1,” which is indicated as
e_{i}
[
k
] in
Fig. 8
. The logic gates are based on the assumption that when
x_{i}
[
k
] is “1,”
e_{i}
[
k
] is “1” only if the detected digital values in the top two slicers,
p_{i}
[
k
] and
n_{i}
[
k
], are “1.” In addition, when
x_{i}
[
k
] is “0,”
e_{i}
[
k
] is “1” only if the detected digital value
n_{i}
[
k
] on the slicer in the middle is “1.” The value of
e_{ib}
[
k
] is an inverted digital value of
e_{i}
[
k
];
x
_{i}
[
k
] is an implementable digital data bit of the digital value
x
[
k
], which can be either “1” or “−1.”
High-speed circuit implementation that compares z [k ] and Bx [k ] and generates sign(e [k ]) term.
Using the digital value of the error sign,
e_{i}
[
k
], and the detected digital data values,
x
_{i}
[
k
− 2],
x
_{i}
[
k
− 1], and
x
_{i}
[
k
], we can create “UP” or “DN” signals for a charge pump that integrates the functions –sign(
a
[
k
])sign(
e
[
k
]), sign(
x
[
k
− 1])sign(
e
[
k
]), and sign(
x
[
k
− 2])sign(
e
[
k
]) of (7), (8), and (9) with a speed of
μ
; this speed is decided by an integration current and the capacitor of the charge pump similar to the charge pump case used in the adaptive XTC case in
Fig. 4
.
Table 2
shows a minimized combinational logic for implementing the integrators for each adaptive coefficient.

The timing difference between
e_{i}
[
k
] and
x
_{i}
[
k
],
x
_{i}
[
k
− 1] or
x
_{i}
[
k
− 2], can be compensated and aligned by adding flip-flops. Since the charge pumps are in either an “UP” or “DN” mode, the charge pumps can be replaced by RC integrators to reduce power consumption; this is not so in the adaptive XTC case (see
Table 1
), where it is required that the charge pumps be in a “no update” state.
Timing for forward and crosstalk signals; forward signal is dominant at integer UI intervals and crosstalk is dominant at half UI points.
The integrated adaptive XTC with an ISI equalization architecture is shown in
Fig. 10(a)
. The data and edge (data transition) samplers are triggered by the interleaved clocks. The XTC loop uses the digital output signal from both samplers, whereas the DFE loop uses the digital output signal of the data sampler and the signal just before the sampler via an error detector block. The VGA gain (
A
) and coefficients of the three taps (
c
_{1}
,
c
_{2}
, and
c
_{3}
) are adaptively adjusted using the error signals. There are three nodes used for a signal observation in the simulation, as marked at the top of
Fig. 10
. The received signal at point X is single-ended and includes a crosstalk from an adjacent channel, which will be converted to a differential by an SDC with a unity gain. The XTC adder block combines the forward signal path and differentiation signal path with a ratio of
G
(1 −
α
) to
α
. The overall gain,
G
, is set to a value of four. After XTC, the signal at point Y is crosstalk-free but suffers from a varying signal amplitude and ISI tails. The variation in amplitude comes from the various crosstalk strengths, the adding ratio of the XTC adder (
α
), the degree of channel loss, and the MIMO signal strength. The AGC and DFE create an NRZ signal with a constant signal level at point Z (see
Fig. 10(a)
); the sign-sign LMS algorithm required to achieve this feat is explained in Section III.
Figure 10(b)
presents the simulation results for a 12 Gb/s adaptive XTC and DFE system. The featured channel has an insertion loss of −15.7 dB with three crosstalk strengths of 60 mV
_{p−p}
, 120 mV
_{p−p}
, and 180 mV
_{p−p}
. The transmitted signal is an NRZ signal with 500 mV
_{p−p}
amplitude. The eye-diagrams for the nodes at points X, Y, and Z are observed and their overall signal amplitude and vertical eye-opening at the sampling point are shown in the first three columns. The fourth column shows the converging values of the AGC gain (
A
) and the XTC adding ratio (
α
). The fifth column shows the values for the DFE coefficients (
c
_{1}
,
c
_{2}
,
c
_{3}
).
The graphs shown in
Fig. 11
summarize the simulation results of
Fig. 10(b)
, as well as more extensive simulations with various channel-loss variations (−15.7 dB, −17.7 dB, and −19.7 dB).
Figure 11(a)
shows the converging XTC ratio (
α
) over various crosstalk strengths. For a larger crosstalk, the XTC allots more gain to the XTC path (
α
). For a larger insertion loss,
α
increases, because the forward-signal amplitude becomes smaller relative to the crosstalk strength, and an increased addition ratio (
α
) is required.
Figure 11(b)
shows the converging AGC gain (
A
) and first DFE tap coefficient (
c
_{1}
) in accordance with various insertion losses. A larger insertion loss requires a higher AGC gain to meet the constant target amplitude, as well as requiring larger DFE tap coefficients to cope with larger ISI tails. Interestingly, the larger MIMO signal helps decrease the required DFE tap coefficients, and the final value for
c
_{1}
in the case of higher crosstalk is smaller.
Figure 11(c)
shows the saved power on AGC only, and
Fig. 11(d)
shows the total saved power (%) of the DFE taps relative to the total consumed power of the DFE taps. For a higher crosstalk and insertion loss, the improvement by a MIMO signal is larger and results in more power being saved. For a −19.7 dB insertion loss and 180 mV
_{p−p}
crosstalk, the MIMO signal energy reaches up to 35% of the DFE energy consumed during equalization. If the signal is added with a proper polarity and timing, then it will save the DFE power, which shows at 4 mA current consumption in our simulation.
(a) Integrated adaptive XTC and DFE architecture for channels with both ISI and crosstalk and (b) eye-diagrams at points X, Y, and Z at convergent state and adaptation of each coefficient (A , α , c _{1}, c _{2}, and c _{3}) at insertion loss of −15.7 dB.
Summary of adaptive XTC and DFE simulation results.
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2014R1A1A2056415) and has been conducted by the Research Grant of Kwangwoon University in 2014. And this work (Grants No. C0249540) was supported by Business for Cooperative R&D between Industry, Academy, and Research Institute funded Korea Small and Medium Business Administration in 2014 and partially supported by the Korean Semiconductor Intellectual Property EXchange (KIPEX), and the IC Design Education Center (IDEC).
Corresponding Author ohtaehyoun@kw.ac.kr
Taehyoun Oh received his BS and MS degrees in electrical engineering from Seoul National University, Rep. of Korea, in 2005 and 2007, respectively. He received his PhD degree in electrical engineering from the University of Minnesota, Minneapolis, USA, in 2012. His doctoral research is focused on high-speed I/O circuits and architectures. During the summer of 2010, he worked on I/O channel modeling at AMD Boston Design Center, MA, USA. In the fall of 2011, his research focused on I/O architecture and jitter budgeting of the serial link when at Intel Corp., CA, USA. From the fall of 2012, he joined IBM System Technology Group, NY, USA and worked on the performance verification of high-speed decision feedback equalizers for server processors. Since the spring of 2013, he has been an assistant professor with the Department of Electronic Engineering, Kwangwoon University, Seoul, Rep. of Korea.
harjani@umn.edu
Ramesh Harjani is the Edgar F. Johnson Professor in the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, USA. He received his PhD degree from Carnegie Mellon University, PA, USA, in 1989; his MS degree from the Indian Institute of Technology, New Delhi, India, in 1984; and his BS degree from the Birla Institute of Technology and Science, Pilani, India, in 1982, all in electrical engineering. Prior to joining the University of Minnesota, he was with Mentor Graphics Corp., San Jose, CA, USA. He co-founded Bermai, Inc., a startup company developing CMOS chips for wireless multimedia applications, in 2001. He has been a visiting professor at both the Lucent Bell Labs, Allentown, PA, USA and the Army Research Labs, Adelphi, MD, USA. His research interests include analog/RF circuits for communications. He received the National Science Foundation Research Initiation Award in 1991 and Best Paper Awards at the 1987 IEEE/ACM Design Automation Conference, the 1989 International Conference on Computer-Aided Design, the 1998 GOMAC, and the 2007, 2010, and 2012 TECHCONs. His research group was the winner of the SRC Design Challenges in both 2000 and 2003. He is an author/editor of eight books. He was an associate editor for IEEE Transactions on Circuits and Systems Part II, 1995–1997; guest editor for both the International Journal of High-Speed Electronics and Systems and the Analog Integrated Circuits and Signal Processing, in 2004; and a guest editor for the IEEE Journal of Solid-State Circuits, 2009–2011. He was a senior editor for the IEEE Journal on Emerging & Selected Topics in Circuits & Systems (JETCAS), 2011–2013 and the technical program chair for the IEEE Custom Integrated Circuits Conference 2012–2013. He was the chair of the IEEE Circuits and Systems Society technical committee on analog signal processing from 1999 to 2000 and a distinguished lecturer of the IEEE Circuits and Systems Society between 2001 and 2002.

Crosstalk cancellation
;
inter-symbol interference
;
bandwidth
;
pre-emphasis
;
equalization
;
parallel interfaces
;
single-ended I/Os
;
high-speed link
;
pulse response
;
jitter
;
vertical eye-opening
;
joint optimization
;
ISI
;
XTC

I. Introduction

High-speed performance in multiple lanes depends on the adaptive calibration algorithm used for crosstalk cancellation (XTC). In this paper, we describe an adaptive calibration algorithm that is currently being developed for high-speed XTC signaling techniques for single-end I/Os.
A crosstalk signal that occurs between channels in multiple PCB lanes strongly depends on the channel spacing of such lanes. Any occurring inter-symbol interference (ISI) or crosstalk, which tends to vary across lanes due to variations in the manufacturing process, requires adaptation. To cope with the variations that can occur during the manufacturing process, we propose an adaptive XTC algorithm that operates in conjunction with the DFE block for channel-ISI mitigation.
A low power XTC architecture is able to achieve high signal integrity in severe crosstalk environments
[1]
–
[5]
. We develop a joint adaptive solution to this power-efficient scheme; the joint adaptive solution itself is also of low power. The results of the adaptive XTC algorithm are verified via full system simulations.
Adaptive algorithms have been actively investigated to find the optimal compensation for channel loss in a single data transmission channel
[5]
–
[11]
. However, due to the lack of development in low power XTC architectures, research into adaptive XTC has largely remained unexplored. Furthermore, integrating adaptive XTC into existing decision feedback equalization (DFE) remains an unsolved problem. We propose a new adaptive XTC algorithm that is able to find a global optimal eye-opening and thus best bit error rate performance for the combination of XTC and DFE.
The key to integrating XTC into existing ISI equalization techniques is to take advantage of the fact that a maximum crosstalk amplitude occurs at a different time in comparison to a maximum forward signal (cursor timing), at which data is sampled. The received signal that is separated by half a unit interval (UI) from the cursor timing is used to adapt the coefficients of an XTC algorithm. In general, a maximum crosstalk is generated half-way between the integer-UI sampling points of the forward signal such that independent operation of XTC and of adaptive DFE is feasible.
The rest of the paper is composed as follows. Section II provides an analysis of crosstalk behavior and an introduction to our analysis of our adaptive XTC algorithm. In Section III, we propose suitable extensions to our basic XTC algorithm, which is then validated via simulations. Although a MIMO signal exists alongside an XTC signal when one received signal is differentiated (see
[1]
), the consideration of a MIMO signal is omitted during our analysis of our adaptive XTC algorithm for simplicity reasons. Section IV shows the standard sign-LMS adaptive loop for DFE, which can be adapted to include our XTC algorithm. Section V presents an integration of the adaptive algorithm for XTC and DFE. We validate the convergence properties of the XTC strength and DFE tap coefficients. Finally, Section VI summarizes the paper.
II. Understanding Crosstalk Behavior

A
PPT Slide

Lager Image

PPT Slide

Lager Image

III. Adaptive XTC

Figure 3
presents the proposed XTC algorithm, which uses the logical relations between types of data transitions and polarity of resultant crosstalk. A data slicer that is triggered by a recovered clock makes decisions on the equalized signal. In parallel, an edge slicer samples the data signal at the time of transition and detects whether any crosstalk is likely to produce a positive or negative impact. The detected digital signals are used to feed an adaptive XTC loop. CML-to-CMOS circuits convert the differential signals at the slicer outputs into digital ones. A combinational logic block generates the “UP” or “DN” pulses depending on the sign of the crosstalk. An integrator updates the XTC gain by integrating the “UP” or “DN” pulses. The “digital delay” and “combinational logic” blocks are similar to the phase detectors in the “clock recovery” block and can be shared to save power.
PPT Slide

Lager Image

Combination logic required for updating XTC gain for channel 1.

_{1}[_{0}] ⊕ _{1}[_{1}] | _{2}[_{0}] | _{2}[_{1}] | _{1}[_{0.5}] | Diagnosis | UP | DN |
---|---|---|---|---|---|---|

1 | 0 | 1 | 0 | Under compensation | 1 | 0 |

1 | 0 | 1 | 1 | Over compensation | 0 | 1 |

1 | 1 | 0 | 0 | Over compensation | 0 | 1 |

1 | 1 | 0 | 1 | Under compensation | 1 | 0 |

PPT Slide

Lager Image

PPT Slide

Lager Image

PPT Slide

Lager Image

IV. Automatic Gain Control and Adaptive DFE

A decision feedback equalizer is a discrete-time scheme, where a sampling clock provides an
PPT Slide

Lager Image

PPT Slide

Lager Image

Combinational logic table for LMS algorithm.

Adaptive coefficients | Charge pump node | Logic expression |
---|---|---|

UP | _{i}_{i} | |

DN | ||

_{1}[ | UP | _{i}_{i} |

DN | ||

_{2}[ | UP | _{i}_{i} |

DN |

V. Combining Adaptation of XTC and DFE Coefficients

In Sections II and III, we proposed an adaptive XTC and described a fundamental sign-sign LMS algorithm to implement an adaptive DFE. In this section, we demonstrate an independent adaptation of the two loops and validate it by behavioral simulations in Verilog-A.
Figure 9
shows a pulse response of a forward signal and crosstalk (bottom left) and the resultant eye-diagram (bottom right). Since the forward signal and crosstalk are independent of each other, the crosstalk acts as a noise — its largest impact occurring on the data transition timing of the forward signal (as is explained by
Fig. 2
). As illustrated in Section II, the sampling timing for an adaptive XTC loop is the middle timing between integer UIs when the noise by a crosstalk is a dominant factor. On the other hand, in most symbol-rate equalization schemes, the sampling time for an adaptive equalization loop is at an integer UI when the vertical eye-opening is largest (cursor timing). At this timing, the channel ISI contributes to the voltage noise mostly and the noise caused by a crosstalk is trivial. Using two independent sampling times for the adaptation loops of XTC and DFE, the integration of the two adaptations can be achieved independently.
PPT Slide

Lager Image

PPT Slide

Lager Image

PPT Slide

Lager Image

VI. Conclusion

A new adaptive algorithm for XTC is proposed and is proven to be workable in conjunction with adaptive DFE schemes. Transition-filtering detectors evaluate for under- or over-compensation by considering the sampled value in the middle of the data transition. Through a low-speed control loop, the XTC gain optimally adapts to the strength of the crosstalk. A range of input signal amplitudes after XTC is handled by the adaptive DFE stages. LMS algorithms are used for the adaptation of the DFE and are integrated with our adaptive XTC algorithm. The different detection timings for adaptive loops enable both the adaptive XTC and the DFE to run independently, making the integration of the two adaptive loops feasible. The beneficial reutilized crosstalk energy has been quantified by considering the final values of the coefficients for the DFE. The MIMO signal contributes to the cursor and DFE taps, allowing smaller adaptation values.
BIO

Oh T.
,
Harjani R.
“A 5 Gb/s 2 × 2 MIMO Crosstalk Cancellation Scheme for High-Speed I/Os,”
IEEE Custom Integr. Circuits Conf.
San Jose, CA, USA
Sept. 19–22, 2010
1 -
4

Oh T.
,
Harjani R.
2011
“A 6-Gb/s MIMO Crosstalk Cancellation Scheme for High-Speed I/Os,”
IEEE J. Solid-State Circuits
46
(8)
1843 -
1856
** DOI : 10.1109/JSSC.2011.2151410**

Oh T.
,
Harjani R.
“4×12 Gb/s 0.96 pJ/b/lane Analog-IIR Crosstalk Cancellation and Signal Reutilization Receiver for Single-Ended I/Os in 65 nm CMOS,”
Symp. VLSI Circuits
Honolulu, HI, USA
June 13–15, 2012
140 -
141

Oh T.
,
Harjani R.
2013
“A 12 Gb/s Multi-channel I/O Using MIMO Crosstalk Cancellation and Signal Reutilization in 65 nm CMOS,”
IEEE J. Solid-State Circuits
48
(6)
1383 -
1397
** DOI : 10.1109/JSSC.2013.2252517**

Lee S.-K.
2013
“A 5 Gb/s Single-Ended Parallel Receiver with Adaptive Crosstalk-Induced Jitter Cancellation,”
IEEE J. Solid-State Circuits
48
(9)
2118 -
2127
** DOI : 10.1109/JSSC.2013.2264618**

Hidaka Y.
“A 4-Channel 10.3 Gb/s Transceiver with Adaptive Phase Equalizer for 4-to-41 dB Loss PCB Channel,”
IEEE Int. Solid-State Circuits Conf.
San Francisco, CA, USA
Feb. 20–24, 2011
346 -
348

Shahramian S.C.
“A Pattern-Guided Adaptive Equalizer in 65 nm CMOS,”
IEEE Int. Solid-State Circuits Conf.
San Francisco, CA, USA
Feb. 20–24, 2011
354 -
356

Huang Y.-C.
,
Liu S.I.
“A 6 Gb/s Receiver with 32.7 dB Adaptive DFE-IIR Equalization,”
IEEE Int. Solid-State Circuits Conf.
San Francisco, CA, USA
Feb. 20–24, 2011
356 -
358

Kim W.-S.
,
Seong C.-K.
,
Choi W.-Y.
“A 5.4 Gb/s Adaptive Equalizer Using Asynchronous-Sampling Histograms,”
IEEE Int. Solid-State Circuits Conf.
San Francisco, CA, USA
Feb. 20–24, 2011
358 -
359

Choi J.-S.
,
Jeong D.-K.
,
Hwang M.-S.
2004
“A 0.18 μm CMOS 3.5-Gb/s Continuous-Time Adaptive Cable Equalizer Using Enhanced Low-Frequency Gain Control Method,”
IEEE J. Solid-State Circuits
39
(3)
419 -
425
** DOI : 10.1109/JSSC.2003.822774**

Jun B.-E.
,
Park D.-J.
,
Kim Y.-W.
“Convergence Analysis of Sign-Sign LMS Algorithm for Adaptive Filters with Correlated Gaussian Data,”
Int. Conf. Acoust., Speech, Signal Process.
Detroit, MI, USA
May 9–12, 1995
1380 -
1383

Citing 'Adaptive Techniques for Joint Optimization of XTC and DFE Loop Gain in High-Speed I/O
'

@article{ HJTODO_2015_v37n5_906}
,title={Adaptive Techniques for Joint Optimization of XTC and DFE Loop Gain in High-Speed I/O}
,volume={5}
, url={http://dx.doi.org/10.4218/etrij.15.0114.0306}, DOI={10.4218/etrij.15.0114.0306}
, number= {5}
, journal={ETRI Journal}
, publisher={Electronics and Telecommunications Research Institute}
, author={Oh, Taehyoun
and
Harjani, Ramesh}
, year={2015}
, month={Oct}