With high bandwidth, low interference, and low power consumption, optical network-on-chip (ONoC) has emerged as a highly efficient interconnection for the future generation of multicore system on chips. In this paper, we propose a new path-setup method for ONoC to mitigate contentions, such as packets, by recycling the setup packet halfway to the destination. A new, strictly non-blocking 6 × 6 optical router is designed to support the new method. The simulation results show the new path-setup method increases the throughput by 52.03%, 41.94%, and 36.47% under uniform, hotspot-I, and hotspot-II traffic patterns, respectively. The end-to-end delay performance is also improved.
As more and more cores are integrated into a single chip, the global interconnect becomes a critical performance bottleneck. Traditional electronic interconnect is faced with problems such as low bandwidth, high power consumption, long delay and so on; which suggests it is not suitable for future interconnection networks
. In recent years, the development of optical devices, such as silicon waveguides, modulators, and resonators,
make optical interconnection a promising solution. Many different optical network-on-chip (ONoC) architectures have been proposed recently. Shacham and others
present a hierarchical ONoC, which is composed of both an electronic and an optical network. The electronic network sets up optical routers before the data transmission and tears down the reserved path thereafter. The optical network is employed to transfer the payload packets. Since optical buffer technology is still in its infancy, circuit switching has become a practical way to support ONoC, which is applied to various topologies
. Circuit-switched photonic network architectures take advantage of the optical spectrum by establishing a high-bandwidth light path dedicated for data transmission
. Lei Zhang and others
employ optical signals to control circuit switching, which reduces the power and latency of the circuit-setup phase. However, as the offered load increases, the path-setup packet holds some links and fails to reserve further links to the destination due to contention. Thus, the occupied links are wasted, resulting in low link utilization and low network throughput. Shacham and others used a negative acknowledgement (NACK) method to solve this problem. When the setup packet is blocked, NACK will be sent back to the source node. The source node will retransmit the setup packet to an alternative path. However, the partially established path is not utilized. In this paper, we propose a hybrid of time and the hop-recycle method, which we will call HTHR, to release the occupied resources as soon as possible. The setup packet can be recycled at halfway, thus releasing any occupied resources. The established partial connection is fully utilized to increase the throughput.
The rest of this paper is organized as follows. In section II, the details of the path-setup method are illustrated. A new, strictly non-blocking 6 × 6 optical router is also designed to support HTHR in section III. Simulation results in section IV show HTHR improves throughput and end-to-end delay performance. Finally, section V concludes the paper.
II. New Path-Setup Method
As 2D mesh is commonly used in ONoC, we take it as an example in our design. However, our method can easily be applied to higher-dimension meshes or other topologies. The ONoC consists of two overlapped networks — electronic and optical. In the traditional path-setup method, the path-setup packet goes from the source to the destination and reserves the whole path. When the path-setup packet arrives at the destination node, an acknowledgement (ACK) will be returned. The ACK goes back to the source node and configures the optical routers along the way. Upon the receipt of an ACK, the source node begins to send optical data. Since optical routers are configured in advance, optical data can be transmitted without being blocked. Upon the arrival of the optical data, all reserved links will be released, and other path-setup packets can compete to use them.
If the port required by the path-setup packet is reserved, the path-setup packet will be blocked at this node and will have to wait until the desired port becomes available. During this period, the path-setup packet holds the established partial path from the source to the current node, therefore blocking more setup packets. To reduce network blocking, we propose HTHR to deflect the blocked path-setup packet to a particular port if certain conditions are satisfied. This particular port is referred to as the recycle port, and its corresponding node is referred to as the recycle node. There is a small recycle buffer in each node to store the temporary received payload packet. The recycle node will send an ACK packet back to the source node if the recycle buffer is not yet full. After the ACK packet is received by the source, optical data will be sent to the recycle node. The links in the path will be released after the optical data is received by the recycle node. However, the recycle node is not the destination node for such data. When the optical data arrives at the recycle node, it is converted to an electrical signal by an optical-to-electrical (O/E) conversion and is stored in the recycle buffer. Before its retransmission to the destination node, it will undergo an electrical-to-optical (E/O) conversion. By applying the HTHR method, network blocking is greatly reduced; however, at the same time more OE/EO overhead (delay, energy consumption) in the recycle nodes is introduced. Hence, we employ the following rules to limit such frequent recycling. If either Rule 1 or Rule 2 is satisfied, then the path-setup packet will be deflected.
: If Distance
is equal to MaxHop, the path-setup packet will be deflected.
is the distance between the current node and the previous recycle node (or the source node). MaxHop is a predefined threshold to limit the maximum hops that a path-setup packet can travel.
: If the desired port is reserved and
is less than
, the path-setup packet will be deflected.
is defined as the total time from the moment the recycle node sends the ACK to the moment it receives the optical data. It is calculated in the current node and is the sum of the ACK’s traveling time and the optical data’s transmission time. The ACK’s traveling time is equal to the number of hops between the current node and the previous recycle node (or the source node) multiplied by the per-hop time of ACK processing. Optical data transmission time can be calculated by the optical bandwidth and optical data size. The predicted time for how long a desired port will be unlocked is represented by
, which can be calculated as follows:
(1) T prd = T avg_new −( T cur p − T lck ),
T cur p
is the time when the path-setup packet arrives and
is the initial time at which the port is locked. The average time the port will be locked is represented by
, which can be calculated as follows:
(2) T avg_new =α T avg_old +(1−α)( T cur o − T lck ),
is the average time the port has been locked and
T cur o
is the time when the optical data arrives. The average of
and the duration of time for which the port has been locked, is represented by
is set to be zero. The importance of
( T cur o − T lck )
is reflected in the weight coefficient
is larger than 0.5, more credits will be given to
In HTHR, the recycle node address is added to the path-setup packet and ACK. For the path-setup packet, the recycle node address is initially the same as the source node address. It changes when the path-setup packet is sent to the recycle port and is set to be the address of the current recycle node. For ACK, the recycle node address is set as the address of the node where it is generated. A detailed description of how HTHR works can be seen in
Pseudocode for HTHR.
To realize HTHR, each node maintains a table named Recycle[
] and a local time table named Time[
]. When an ACK enters the router from port
, the ACK’s recycle node address will be recorded in Recycle[
] will be cleared when the related optical data arrives. Time[
] records the result of
. When a path-setup packet arrives, the desired output port is given by the routing algorithm. The distance
between the current node and the destination can be known. With
, we can check Time[
] to obtain
. Knowing the length of time a port has been locked, we can calculate
. Then, we can judge whether Rule 2 is satisfied or not. The following details how the time table is updated. When a path-setup packet arrives and reserves a certain port
, the time will be recorded in
]. Once the port
is released, using the address stored in Recycle[
], we can calculate the distance
between the current node and the next recycle node. With the results of
] adaptively updates itself. The updating method is given by
(3) T new [port][hop]=α T old [port][hop]+(1−α)( T cur o − T lck [port]).
[port][hop] are the previously recorded time and the newly recorded time, respectively. The size of the time table is determined by MaxHop. Originally, it is O(#
) × O(
), where #
is the number of ports and
is the maximum distance of any two nodes in the network. Now, the size of the time table is reduced to O(#
) × O(MaxHop).
III. Recycle Optical Router (ROR) Architecture
We propose a new router to support HTHR using XY routing algorithm, as is shown in
. The new router has six bidirectional ports, with a recycle port to store the temporarily ejected data. The micro-ring resonators in the router are identical and have the same on-state resonance wavelengths λ
. The new router uses the wavelength λ
to transfer the data information. For example, if the optical signal is input from X– and output to port Y+, micro-ring resonator 7 will be powered on. Micro-ring resonators 5, 6, 4, and 2 will remain in an “off” state. Hence, the input signal will make a turn at micro-ring resonator 7 and output at port Y+.
To have a compact area and low insertion loss, ROR is optimized to reduce the number of waveguides, micro-ring resonators, and waveguide crossings. Assuming the radius of a micro-ring resonator to be 5 μm, with spacing from the waveguides of 200 nm
, the footprint of ROR is about 15,376 μm
. ROR applies the XY routing algorithm, which is commonly used and free of deadlock. In the XY routing algorithm, the turns from Y to X dimension are prohibited. There is no need for the packets to switch between local and recycle ports. Hence, the number of micro-ring resonators can be reduced to 20. There is no resonator in the “on” state when the packets are transmitted in the same dimension. For example, if the packets enter through the X+ port and go to the X– port, the resonators along the path are all in the “off” state. This feature is helpful for saving power, because the micro-ring resonators consume no energy when they are in this state. The new router is strictly non-blocking because it guarantees an internal path from any input port to any output port, as long as no packets are destined for the same output. This can be proved by listing all the possible cases. This characteristic is important for improving network performance.
6×6 optical router for HTHR.
IV. Performance Analysis
We use network simulator OPNET to evaluate HTHR performance in terms of end-to-end (ETE) delay, throughput, and energy dissipation. ETE delay is defined to be the average total delay of a packet from its creation to its destruction at destination including: path-setup time, O/E and E/O conversion time, and transmission time. Throughput is used to measure the rate of the payload packets received by each node under a given offered load.
The energy dissipation arises from two sources: electrical — which includes electrical switching fabric (including buffers) and inter-switch wires — and optical — which comprises O/E and E/O conversions, and optical routers.
Uniform and hotspot traffic patterns are applied. Uniform traffic pattern means each core will be chosen as the destination with equal probability. In a hotspot traffic pattern, 10% of the total traffic will be sent to the hot area, and the rest of the traffic will be sent uniformly to other nodes in the network. Hotspot-I chooses the middle four nodes as hot areas, while hotspot-II chooses the corner four nodes as hot areas. The energy dissipations for a 5 × 5 crossbar and a link are 458.75 fJ/bit and 755.6 fJ/bit/m, respectively — which were obtained by Orion
. According to
, the O/E and E/O conversions consume 21.52 fJ/bit and 60.87 fJ/bit, respectively. The static power of an optical switch is 400 μW. In (3),
is a weight coefficient and reflects the importance of
). In the simulation, we set
= 0.5 to maintain fairness. The size of the recycle buffer is set to four packets (1,024 bits), which is only about 15% of the total size of the general electrical router’s buffers. We could also increase the size of an optical packet, to say 512 bits, which would mean the recycle buffer would then have to accommodate two packets (1,024 bits). We obtained the energy consumption of a general router and recycle buffer router with different recycle buffers by Orion
. The recycle buffer router, which has different recycle buffers, consumes 12.9% and 25.8% more energy than the general router without recycle buffers, respectively as seen in
Energy consumption of different routers (mw).
| ||General router ||Recycle buffer router ||Increase in energy consumption |
|Energy consumption ||58.3389 ||65.8849 (1,024 bits) ||12.9% |
|58.3389 ||73.431 (2,048 bits) ||25.8% |
Throughput performance with optical packet 256 bits.
Throughput performance with optical packet 512 bits.
We also conduct simulations of the proposed approach and the traditional approach, under different optical-packet sizes. From
, we can see that the new, path-setup method increases the throughput by 52% under 256 bits optical packet and by 43% under 512 bits optical packet. Although, when the size of an optical packet is 512 bits and the recycle buffer is accommodating fewer packets (two packets), the performance of throughput is still better than the traditional method. The percentage of the increment of throughput declines slightly. If the payload is bigger than the recycle buffer, it must be broken into multiple pieces to transmit, which will produce only a small additional path-setup overhead.
The predefined threshold MaxHop, can be obtained by initial runs and optimization. We conduct simulations to show that a different recycle-hop threshold can partly influence the network ETE delay and throughput performance. In this paper, we only show the results of an 8 × 8 network. We assumed the recycle buffer has an infinite size and did the simulation without limiting the recycle time. The results are plotted in
, which show the ETE delay and throughput performance as the traffic load increases. It is obvious that when the recycle-hop threshold is equal to five, the ETE delay and throughput performance are best. If the threshold value is set too small, the packet will be dropped many times, which will not utilize the advantages of circuit switching. In the extreme case, when the threshold value is equal to one, it is similar to packet switching. More (O/E) and (E/O) conversions will introduce more energy consumption. If the threshold value is set too large, it is similar to traditional circuit switching. The established partial connection will hold the related resources, which leads to congestion and network performance degradation. Thus, the choice of MaxHop will be a tradeoff. As a result, we use recycle-hop threshold MaxHop = 5, which is the average hop for an 8 × 8 mesh.
ETE-delay performance under different recycle hops.
Throughput performance under different recycle hops.
Next, we conduct the simulation of the proposed approach under four packets (1,024 bits) recycle buffers with traditional, one-in-three traffic patterns.
show ETE delay and throughput performance, respectively. From
, we can see that as the offered load increases, the increased contention causes latency to increase as packets have to wait for ports and links. When the offered load is low, the traditional method and HTHR have similar performance. As the offered load increases, HTHR has a better ETE-delay performance. HTHR can release optical links in a timely manner, thus improving optical links utilization.
ETE-delay performance under different traffics.
Throughput performance under different traffics.
Throughput depends on the injection rate of a network. Ideally, throughput should increase linearly with the offered load. However, due to the limitation of network resources, throughput will saturate at a certain offered load. This can be seen from
, when the offered load is low, the curves increase linearly with the offered load but after saturation of the offered load, all remain steady. The throughput of HTHR surpasses that of the traditional method by 52.03%, 41.94%, and 36.47% under the uniform, hotspot-I, and hotspot-II traffic patterns, respectively.
In addition, we make a comparative study among the traditional approach, the NACK approach
, and the proposed approach with four packets (1,024 bits) recycle buffers; all under the uniform traffic pattern.
show the ETE delay and throughput performance, respectively. Because the path-setup packet will be transmitted several times and the partially established connection is not fully utilized, the ETE delay and throughput performance using the NACK approach is worse than the HTHR approach — which however, is almost the same as the traditional approach.
ETE-delay performance with different approaches.
Throughput performance with different approaches.
Energy dissipation under different traffic patterns (nJ/packet).
| ||Uniform ||Hotspot-I ||Hotspot-II |
|Traditional method ||0.432 ||0.427 ||0.436 |
|HTHR ||0.478 ||0.471 ||0.51 |
Finally, we make an analysis of the energy consumption of the proposed approach.
gives energy dissipation under different traffic patterns. The static energy consumption for the packets waiting in the buffer is not calculated. The packet will undergo more O/E conversions and E/O conversions than the traditional method when we use HTHR. Therefore, HTHR consumes a little more energy than the traditional method; this can be seen in
To solve the problem of low utilization of traditional circuit switching for ONoC, this paper proposes a new path-setup method, HTHR. HTHR helps to reduce the blocking rate and fully utilize the partially established connection. To support this method, a strictly non-blocking 6 × 6 optical router is designed. Although we employ mesh to illustrate the design of HTHR, this new method can easily be applied to other topologies. The simulation results show that HTHR can improve the performance of ETE delay and throughput, under different traffic patterns. However, waiting time in the recycle buffer will introduce additional energy consumption.
This work was partly supported by the National Science Foundation of China under Grant (No. 61070046, No. 61334003), Shenzhen Research Funding (No. JCYJ20130401171935815), the Fundamental Research Funds for Central Universities under Grant (No. K5051301003), and the 111 Project under Grant (No. B08038).
Huaxi Gu received his PhD degree in telecommunication and information system from Xidian University in 2005. He is currently a professor in the state key lab of ISN, Xidian University, Xi’an, Shaanxi, China. His current research interests include interconnection networks, networks-on-chip and optical interconnect, and data center networks. He has more than 80 publications in many international journals and conferences.
Kai Gao received his BE degree in electronics and information engineering from Xidian University, Xi’an, Shaanxi, China, in 2011. Since 2011, he has been working toward his ME degree in electronics and communications engineering in the state key lab of ISN, Xidian University, Xi’an, Shaanxi, China. His main research interests include networks-on-chip and optical interconnected networks.
Zhengyu Wang received his ME degree in telecommunication and information systems from Xidian University, Xi’an, Shaanxi, China, in 2013. His main research interests include networks-on-chip and optical interconnected networks.
Yintang Yang received his PhD degree from the School of Technical Physics, Xidian University, Xi’an, Shaanxi, China. He is currently a professor at the Microelectronics Institute, Xidian University, Xi’an, Shaanxi, China. He won the title of national model teacher and the Chinese youth science & technology award. He was selected into the “Trans-century Outstanding Talents Program” of the ministry of education, the national “Key Talents Program” & “New Century Key Talents Program”.
Xiaoshan Yu received his ME degree in electronics and communications engineering from Xidian University, Xi’an, Shaanxi, China, in 2013. Since 2013, he has been working toward his PhD degree in telecommunication and information systems at the state key lab of ISN, Xidian University, Xi’an, Shaanxi, China. His current research interests include optical interconnected networks and data center networks.
“Predictions of CMOS Compatible on-Chip Optical Interconnect,”
ACM/IEEE Int. Workshop Syst. Level Interconnect Prediction
San Francisco, USA
“Challenges for on-Chip Optical Interconnects,”, Optoelectronic Integr. Silicon II
Mar. 14, 2005
“Benchmarking on-Chip Optical Against Electrical Interconnect for High-Performance Applications,”
“Device Modeling and System Simulation of Nanophotonic on-Chip Networks for Reliability, Power and Performance,”
Proc. DAC, New York, USA
New York, USA
“High-Performance Modulators and Switches for Silicon Photonic Networks-on-Chip,”
IEEE J. Sel. Topics Quantum Electron.
DOI : 10.1109/JSTQE.2009.2028437
“A Temperature-Insensitive Third-Order Coupled-Resonator Filter for on-Chip Terabit/s Optical Interconnects,”
IEEE Photon. Technol. Lett.
DOI : 10.1109/LPT.2010.2085426
“Ultra-Compact, Low RF Power, 10 Gb/s Silicon Mach-Zehnder Modulator,”
DOI : 10.1364/OE.15.017106
“High-Frequency Modeling and Optimization of E/O Response and Reflection Characteristics of 40 Gb/s EML Module for Optical Transmitters,”
DOI : 10.4218/etrij.12.0111.0516
“Fabrication of 40 Gb/s Front-End Optical Receivers Using Spot-Size Converter Integrated Waveguide Photodiodes,”
DOI : 10.4218/etrij.05.0905.0023
“Photonic Networkson- Chip for Future Generations of Chip Multiprocessors,”
IEEE Trans. Comput.
DOI : 10.1109/TC.2008.78
“A Hierarchical Hybrid Optical-Electronic Network-on-Chip,”
July 5–7, 2010
“A Low-Power Fat Tree-Based Optical Network-on-Chip for Multiprocessor System-on-Chip,”
Apr. 20-24, 2009
“3D Optical Networks-on-Chip (NoC) for Multiprocessor Systems-on-Chip (MPSoC),”
IEEE Int. Conf. 3D Syst. Integr.
San Francisco, CA, USA
Sept. 28–30, 2009
“Circuit-Switched Memory Access in Photonic Interconnection Networks for High-Performance Embedded Computing,”
Int. Conf. High Performance Comput., Netw., Storage Anal.
New Orleans, LA, USA
“Circuit-Switched on-Chip Photonic Interconnection Network,”
IEEE Int. Conf. Group IV Photon.
San Diego, CA, USA
Aug. 29–31, 2012
“Optical 4×4 Hitless Slicon Router for Optical Networks-on-Chip (NoC),”
DOI : 10.1364/OE.16.015915
“ORION 2.0: A Fast and Accurate NoC Power and Area Model for Early-Stage Design Space Exploration,”
Apr. 20–24, 2009
“DSENT-A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling,”
IEEE/ACM Int. NoCS
May 9–11, 2012