With the advancement of Internet technology, data centres are progressively evolving into "computing power centres." High-performance applications, such as artificial intelligence and machine learning, are experiencing rapid development, leading to the emergence of various business sectors, including autonomous vehicles, big data streaming, and interest-based e-commerce. As the supporting infrastructure adapts to applications like artificial intelligence and machine learning, GPU (Graphics Processing Unit) computing clusters impose more stringent requirements on network transmission than CPU (Central Processing Unit) general computing clusters. This phenomenon is commonly referred to as the evolutionary model of "business-driven network iteration." In this framework, network iteration commences with high-performance GPU scenarios to address premier business demands and subsequently broadens its support to encompass more general scenarios, thereby maximizing the benefits derived from technological advancements.
Figure 1: Schematic diagram of data centre network architecture
In addition to the direct demands from the business sector, enhancements to various facilities within the data centre to satisfy functional requirements are also indirectly influencing the advancement of network equipment. For instance, GPU servers utilizing the next-generation H100 necessitate a network access bandwidth of 400G; furthermore, the next-generation CX7 smart network card requires that the network access switch supports PAM4-112G SerDes (Serializer / Deserializer).
Driven by the dual imperatives of business needs and hardware innovation, the upgrading of the data centre network architecture is essential. To achieve this iteration, the technologies at three levels—switching chips, SerDes, and optical modules—must advance in a coordinated manner, as each component is integral to the overall process. It is evident that this trajectory of technological evolution will encounter numerous challenges, with power consumption being a particularly complex issue to address.
Figure 2: Factors Driving the Iterative Upgrading of Data Centre Networks and Power Consumption Challenges
Commencing with the evaluation of switch chips that influence switch performance, it is observed that advancements in switch chip technology have led to a reduction in power consumption per bit. However, concomitant with the increase in switching bandwidth, the aggregate power consumption of switch chips utilized in data centres continues to rise annually. In addition, to switch chips, serializer - deserializer (SerDes) circuits and optical modules significantly contribute to the escalating power consumption. Data analysis reveals that the overall power consumption of a single switch in 2022 is 22 times greater than that of a single switch in 2010. Additionally, the power consumption associated with SerDes chips has amplified by a factor of 25, while the power consumption of optical modules has increased by 26 times.
In examining the evolution of optical modules, it is noteworthy that in 2007, the power consumption of a 10G optical module was less than 1 watt. However, as advancements have been made from 40G and 100G to the contemporary 400G and 800G optical modules, and with the anticipated introduction of 1.6T optical modules, power consumption has escalated dramatically, nearly reaching 30 watts. In scenarios where a switch is fully equipped with 1.6T optical modules, the power consumption becomes profoundly significant.
The technological advancements of traditional pluggable optical modules are insufficient to support the sustainable development of data centres, which can be primarily observed in four critical areas:
Figure 3: The bottleneck in the development of traditional pluggable optical module technology
The realization of signal integrity (SI) encounters significant material bottlenecks. In the context of high-speed telecommunications signal transmission via printed circuit boards (PCBs), the utilization of traditional pluggable optical modules presents challenges related to signal transmission distance and loss. The extended transmission distance contributes to notable signal degradation, posing significant obstacles to achieving robust signal integrity. Additionally, the development of lower-loss, mass-producible PCB materials faces numerous technical challenges that impede progress.
Moreover, power consumption presents another critical issue. A fully loaded 1.6T module device exhibits substantial power demands, which complicates heat dissipation design, including associated requirements for cabinet power supply. The escalation of power consumption leads to a corresponding increase in total equipment costs, including supplementary expenses related to utilities such as electricity and cooling. This dynamic consequently elevates the initial investment required for network infrastructure development.
Furthermore, product design challenges arise. Systems that employ traditional pluggable optical modules necessitate intricate system designs to accommodate 128 ports, while also addressing technical concerns related to the thermal management of high-power optical modules. This complexity contributes to elevated system costs.
In summary, Ruijie Networks aims to address the power consumption challenges within the iterative data centre network architecture, which encompasses switching chips, SerDes technology, and optical module innovations. The objective is to establish the next generation of green, energy-efficient, and sustainable data centres. By leveraging customer business scenarios and product practices, Ruijie Networks presents innovative solutions and technical recommendations for sustainable data centre networks, organized into three distinct layers.
The foundational layer focuses on architecture upgrades grounded in next-generation chips, SerDes, and optical module technologies to facilitate iterative enhancements of the network architecture, thereby accommodating the ever-growing bandwidth requirements of applications such as artificial intelligence and machine learning. Building upon these architectural upgrades, the initiative begins with network devices to address the existing power consumption issues associated with SerDes and optical modules. It is essential to recognize that these challenges are not exclusively confined to the current generation; every future generation of network architecture will inevitably confront similar issues. Consequently, it is imperative to envision a sustainable development trajectory for data centre networks that prioritizes cost-efficiency and low power consumption.
Figure 4: Construction goals for the next generation of green, energy-efficient, and sustainable data centres.
The advancement of sustainable development in technology can be realized through a two-stage progression in the development of switch silicon photonics technology. The first stage is referred to as the NPO (Near Packaged Optics) technology stage, which provides significant advantages in terms of cost-effectiveness and reduced power consumption, facilitating rapid deployment prior to the establishment of the CPO (Co-Packaged Optics) ecosystem. The second stage encompasses the CPO technology stage, representing the pinnacle of switch silicon photonics technology, with the capability to substantially lower both network costs and power consumption.
Figure 5: Recommendations for a Sustainable Development Technology Roadmap for Data Centre Networks
The optical engine is responsible for the optoelectronic conversion function within the switching network, with the most prevalent configuration being pluggable. As technology has advanced, new product forms have emerged. The Chip-Optical Engine Co-packaged form (CPO) integrates the switching chip and the optical engine into a unified socketed configuration, thereby facilitating a co-packaging of the chip and module. Conversely, the Non-Packaged Optical form (NPO) involves the decoupling of the optical engine from the switching chip, with both components being assembled onto the same system motherboard. While both configurations contain optoelectronic modules, the differing package positions result in variations in wiring distance and associated power consumption.
Figure 6: Overview of Silicon Photonics Technology
The CPO architecture achieves a high level of integration by employing silicon photonics technology, which optimizes both cost and power consumption. The fundamental approach to reducing power consumption involves significantly minimizing the wiring distance—restricted to approximately 50-70 millimetres—between the switch chip and the optical engine through a co-packaged design. This configuration decreases the driving power costs associated with the SerDes, facilitates the deployment of higher-density high-speed ports, enhances overall bandwidth density, and markedly reduces power consumption. In the long term, as the integration of co-packaged chips and silicon photonic components increases, the current inadequacies in the silicon photonics technology ecosystem highlight the necessity for openness, representing a crucial goal from a commercialization standpoint.
Figure 7: CPO Architecture Schematic Diagram
Figure 8: CPO Schematic
The NPO architecture represents an alternative implementation for switch systems, leveraging silicon photonics technology. Its high level of integration and open ecosystem facilitates significant advantages in both cost and power consumption. The fundamental principle of the NPO architecture is the decoupling of the optical engine from the chip within a standardized framework, achieved by mounting both components on a common motherboard via a standardized optical engine interface. This arrangement permits the flexible selection of the switching chip and the NPO module. While the NPO architecture may not realize the same degree of savings in power consumption and costs as the CPO architecture, it offers enhanced openness. As the NPO industry chain evolves, the introduction of commercial CPO modules is anticipated by 2024. Ruijie Networks, a participant in the Optical Internet Foundation (OIF), is actively engaged in exploring and developing NPO switch technology.
Figure 9: Schematic Diagram of NPO Architecture
In November 2021, Ruijie Networks was honoured to be invited to participate in the global Open Compute Project (OCP) summit. During this prestigious event, Ruijie Networks officially unveiled its 25.6T silicon photonic NPO cold plate liquid cooling switch, designed to meet the stringent reliability standards of data centres and carrier networks.
Figure 10: Ruijie Networks 25.6T Silicon Photonics NPO Cold Plate Liquid-Cooled Switch
Ruijie Networks has developed a 25.6T silicon photonic NPO cold plate liquid cooling switch, leveraging the latest 112G SerDes switching chip technology. This switch features a high-density port configuration that accommodates 64 ports operating at 400G, all within a 1RU form factor, facilitated by its 64 connectors. The system is structured with sixteen 1.6T (4×400G DR4) NPO modules, supporting eight ELS/RLS (external laser source modules). A notable advancement includes the reduction of wiring distance from the ASIC to the optical module on the printed circuit board (PCB) by 60% to 70%, thereby significantly enhancing high-speed signal integrity.
The overall architecture of the device incorporates an x86 CPU along with a 3+1 fan module and a 1+1 power module for redundancy. Furthermore, the core zone utilizes cold plate cooling technology with non-conductive liquid, which mitigates the risks of leakage and short circuits. This innovation provides robust support for the sustainable development of data centre networks.
Figure 11: Ruijie Networks 25.6T Silicon Photonic NPO Cold Plate Liquid Cooling Switch
In 2022, Ruijie Networks introduced the latest 51.2T silicon photonic NPO cold plate liquid cooling switch at OFC2022. This prototype is based on the 800G NPO architecture of the 51.2T switch chip. Maintaining a height of 1RU, the 51.2T switch has enhanced the NPO module's capacity from 1.6T to 3.2 T. The front panel accommodates 64 800G connectors, each of which can be divided into two 400G ports to ensure forward compatibility. Furthermore, the number of external light source modules has increased to 16. The incorporation of a blind-mate design mitigates the risk of eye damage from high-power lasers, thereby significantly enhancing the safety of maintenance personnel. Both the switch chip and NPO module are designed to support cold plate cooling, achieving effective heat dissipation and addressing issues associated with high thermal density. When compared to traditional pluggable optical modules and air cooling solutions, this innovation results in a significant reduction in power consumption while maintaining equivalent performance.
Figure 12: Ruijie Networks 51.2T Silicon Photonics NPO Cold Plate Liquid-Cooled Switch
The application scenarios for NPO switches are highly diverse. The 51.2T NPO switch developed by Ruijie Networks is engineered for deployment within next-generation ultra-large-scale 400G networks, functioning effectively as both Leaf and Spine devices to facilitate high-speed backbone interconnectivity. We released it commercially at the end of 2023, enabling our customers to benefit promptly from reduced power consumption and lower operational costs.
Figure 13: Design of the Next-Generation Network Architecture Based on the NPO Switch
Ruijie Networks has expanded its operations internationally, with members of the Optical Internetworking Forum (OIF) and the Coherent Optical Bootable (COBO) working groups regularly participating in global meetings focused on silicon photonics. The organization is dedicated to contributing to advancements in global technology. Moving forward, Ruijie Networks will remain committed to pursuing a sustainable development trajectory in silicon photonics, with the objective of developing additional products that assist customers in achieving energy efficiencies aligned with green initiatives.
Figure 14: OIF Working Group Global Meeting Venue