NVIDIA Blackwell Architecture and B200/B100 Accelerators Announced: Going Bigger With Smaller Data <p>Already solidly in the driver’s seat of the generative AI accelerator market at this time, NVIDIA has long made it clear that the company isn’t about to slow down and check out the view. Instead, NVIDIA intends to continue iterating along its multi-generational product roadmap for GPUs and accelerators, to leverage its early advantage and stay ahead of its ever-growing coterie of competitors in the accelerator market. So while NVIDIA’s ridiculously popular H100/H200/GH200 series of accelerators are already the hottest ticket in Silicon Valley, it’s already time to talk about the next generation accelerator architecture to feed NVIDIA’s AI ambitions: Blackwell.</p> GPUs

TSMC Jumps Into Silicon Photonics, Lays Out Roadmap For 12.8 Tbps COUPE On-Package Interconnect <p align="center"><a href="https://www.anandtech.com/show/21373/tsmc-adds-silicon-photonics-coupe-roadmap-128tbps-on-package"><img src="https://images.anandtech.com/doci/21373/council-bluffs-network-room_575px.jpg" alt="" /></a></p><p><p>Optical connectivity – and especially silicon photonics – is expected to become a crucial technology to enable connectivity for next-generation datacenters, particularly those designed HPC applications. With ever-increasing bandwidth requirements needed to keep up with (and keep scaling out) system performance, copper signaling alone won't be enough to keep up. To that end, several companies are developing silicon photonics solutions, including fab providers like TSMC, who this week outlined its 3D Optical Engine roadmap as part of its 2024 North American Technology Symposium, laying out its plan to bring up to 12.8 Tbps optical connectivity to TSMC-fabbed processors.</p>

<p>TSMC's Compact Universal Photonic Engine (COUPE) stacks an electronics integrated circuit on photonic integrated circuit (EIC-on-PIC) using the company's SoIC-X packaging technology. The foundry says that usage of its SoIC-X enables the lowest impedance at the die-to-die interface and therefore the highest energy efficiency. The EIC itself is produced at a 65nm-class process technology.</p>

<p style="text-align: center;"><a href="https://www.anandtech.com/show/21373/tsmc-adds-silicon-photonics-coupe-roadmap-128tbps-on-package"><img alt="" src="https://images.anandtech.com/doci/21373/TSMC-3D-Optical-Engine.png" style="width: 100%;" /></a></p>

<p>TSMC's 1st Generation 3D Optical Engine (or COUPE) will be integrated into an OSFP pluggable device running at 1.6 Tbps. That's a transfer rate well ahead of current copper Ethernet standards – which top out at 800 Gbps – underscoring the immediate bandwidth advantage of optical interconnects for heavily-networked compute clusters, never mind the expected power savings.</p>

<p>Looking further ahead, the 2nd Generation of COUPE is designed to integrate into CoWoS packaging as co-packaged optics with a switch, allowing optical interconnections to be brought to the motherboard level. This version COUPE will support data transfer rates of up to 6.40 Tbps with reduced latency compared to the first version.</p>

<p>TSMC's third iteration of COUPE – COUPE running on a CoWoS interposer – is projected to improve on things one step further, increasing transfer rates to 12.8 Tbps while bringing optical connectivity even closer to the processor itself. At present, COUPE-on-CoWoS is in the pathfinding stage of development and TSMC does not have a target date set.</p>

<p>Ultimately, unlike many of its industry peers, TSMC has not participated in the silicon photonics market up until now, leaving this to players like GlobalFoundries. But with its 3D Optical Engine Strategy, the company will enter this important market as it looks to make up for lost time.</p>

<h3><strong>Related Reading</strong></h3>

<ul>
<li><a href="https://www.anandtech.com/show/21369/tsmcs-16nm-technology-announced-for-late-2026-a16-with-super-power-rail-bspdn">TSMC's 1.6nm Technology Announced for Late 2026: A16 with "Super Power Rail" Backside Power</a></li>
<li><a href="https://www.anandtech.com/show/21370/tsmc-2nm-update-n2-in-2025-n2p-loses-bspdn-nanoflex-optimizations">TSMC 2nm Update: N2 In 2025, N2P Loses Backside Power, and NanoFlex Brings Optimal Cells</a></li>
<li><a href="https://www.anandtech.com/show/21371/tsmc-preps-lower-cost-4nm-n4c-process-for-2025">TSMC Preps Cheaper 4nm N4C Process For 2025, Aiming For 8.5% Cost Reduction</a></li>
<li><a href="https://www.anandtech.com/show/21372/tsmcs-system-on-wafer-platform-goes-3d-cow-sow">TSMC's System-on-Wafer Platform Goes 3D: CoW-SoW Stacks Up the Chips</a></li>
</ul>
</p> Semiconductors

TSMC Jumps Into Silicon Photonics, Lays Out Roadmap For 12.8 Tbps COUPE On-Package Interconnect
Optical connectivity – and especially silicon photonics – is expected to become a crucial technology to enable connectivity for next-generation datacenters, particularly those designed HPC applications. With ever-increasing bandwidth requirements needed to keep up with (and keep scaling out) system performance, copper signaling alone won't be enough to keep up. To that end, several companies are developing silicon photonics solutions, including fab providers like TSMC, who this week outlined its 3D Optical Engine roadmap as part of its 2024 North American Technology Symposium, laying out its plan to bring up to 12.8 Tbps optical connectivity to TSMC-fabbed processors.

TSMC's Compact Universal Photonic Engine (COUPE) stacks an electronics integrated circuit on photonic integrated circuit (EIC-on-PIC) using the company's SoIC-X packaging technology. The foundry says that usage of its SoIC-X enables the lowest impedance at the die-to-die interface and therefore the highest energy efficiency. The EIC itself is produced at a 65nm-class process technology.

TSMC's 1st Generation 3D Optical Engine (or COUPE) will be integrated into an OSFP pluggable device running at 1.6 Tbps. That's a transfer rate well ahead of current copper Ethernet standards – which top out at 800 Gbps – underscoring the immediate bandwidth advantage of optical interconnects for heavily-networked compute clusters, never mind the expected power savings.

Looking further ahead, the 2nd Generation of COUPE is designed to integrate into CoWoS packaging as co-packaged optics with a switch, allowing optical interconnections to be brought to the motherboard level. This version COUPE will support data transfer rates of up to 6.40 Tbps with reduced latency compared to the first version.

TSMC's third iteration of COUPE – COUPE running on a CoWoS interposer – is projected to improve on things one step further, increasing transfer rates to 12.8 Tbps while bringing optical connectivity even closer to the processor itself. At present, COUPE-on-CoWoS is in the pathfinding stage of development and TSMC does not have a target date set.

Ultimately, unlike many of its industry peers, TSMC has not participated in the silicon photonics market up until now, leaving this to players like GlobalFoundries. But with its 3D Optical Engine Strategy, the company will enter this important market as it looks to make up for lost time.

Kioxia Details BiCS 8 NAND at FMS 2024: 218 Layers With Superior Scaling
Kioxia's booth at FMS 2024 was a busy one with multiple technology demonstrations keeping visitors occupied. A walk-through of the BiCS 8 manufacturing process was the first to grab my attention. Kioxia and Western Digital announced the sampling of BiCS 8 in March 2023. We had touched briefly upon its CMOS Bonded Array (CBA) scheme in our coverage of Kioxial's 2Tb QLC NAND device and coverage of Western Digital's 128 TB QLC enterprise SSD proof-of-concept demonstration. At Kioxia's booth, we got more insights.

Traditionally, fabrication of flash chips involved placement of the associate logic circuitry (CMOS process) around the periphery of the flash array. The process then moved on to putting the CMOS under the cell array, but the wafer development process was serialized with the CMOS logic getting fabricated first followed by the cell array on top. However, this has some challenges because the cell array requires a high-temperature processing step to ensure higher reliability that can be detrimental to the health of the CMOS logic. Thanks to recent advancements in wafer bonding techniques, the new CBA process allows the CMOS wafer and cell array wafer to be processed independently in parallel and then pieced together, as shown in the models above.

The BiCS 8 3D NAND incorporates 218 layers, compared to 112 layers in BiCS 5 and 162 layers in BiCS 6. The company decided to skip over BiCS 7 (or, rather, it was probably a short-lived generation meant as an internal test vehicle). The generation retains the four-plane charge trap structure of BiCS 6. In its TLC avatar, it is available as a 1 Tbit device. The QLC version is available in two capacities - 1 Tbit and 2 Tbit.

Kioxia also noted that while the number of layers (218) doesn't compare favorably with the latest layer counts from the competition, its lateral scaling / cell shrinkage has enabled it to be competitive in terms of bit density as well as operating speeds (3200 MT/s). For reference, the latest shipping NAND from Micron - the G9 - has 276 layers with a bit density in TLC mode of 21 Gbit/mm², and operates at up to 3600 MT/s. However, its 232L NAND operates only up to 2400 MT/s and has a bit density of 14.6 Gbit/mm².

It must be noted that the CBA hybrid bonding process has advantages over the current processes used by other vendors - including Micron's CMOS under array (CuA) and SK hynix's 4D PUC (periphery-under-chip) developed in the late 2010s. It is expected that other NAND vendors will also move eventually to some variant of the hybrid bonding scheme used by Kioxia.

Storage

March 22, 2025

Rapidus Wants to Offer Fully Automated Packaging for 2nm Fab to Cut Chip Lead Times <p align="center"><a href="https://www.anandtech.com/show/21525/rapidus-2nm-fully-automated-chip-packaging-to-cut-lead-times"><img src="https://images.anandtech.com/doci/21525/intel-foundry-wafer-semiconductor-fab-ifs-678_575px.jpg" alt="" /></a></p><p><p>One of the core challenges that Rapidus will face when it kicks off volume production of chips on its 2nm-class process technology in 2027 is lining up customers. With Intel, Samsung, and TSMC all slated to offer their own 2nm-class nodes by that time, Rapidus will need some kind of advantage to attract customers away from its more established rivals. To that end, the company thinks they've found their edge: fully automated packaging that will allow for shorter chip lead times than manned packaging operations.</p>

<p>In an interview with <a href="https://asia.nikkei.com/Editor-s-Picks/Interview/Japan-s-Rapidus-to-fully-automate-2-nm-chip-fab-president-says">Nikkei</a>, Rapidus' president, Atsuyoshi Koike, outlined the company's vision to use advanced packaging as a competitive edge for the new fab. <a href="https://www.anandtech.com/show/21411/rapidus-adds-chip-packaging-services-to-plans-for-32b-2nm-fab">The Hokkaido facility</a>, which is currently under construction and is expecting to begin equipment installation this December, is already slated to both produce chips and offer advanced packaging services within the same facility, an industry first. But ultimately, Rapidus biggest plan to differentiate itself is by automating the back-end fab processes (chip packaging) to provide significantly faster turnaround times.</p>

<p>Rapidus is targetting back-end production in particular as, compared to front-end (lithography) production, back-end production still heavily relies on human labor. No other advanced packaging fab has fully automated the process thus far, which provides for a degree of flexibility, but slows throughput. But with automation in place to handle this aspect of chip production, Rapidus would be able to increase chip packaging efficiency and speed, which is crucial as chip assembly tasks become more complex. Rapidus is also collaborating with multiple Japanese suppliers to source materials for back-end production. </p>

<p>"In the past, Japanese chipmakers tried to keep their technology development exclusively in-house, which pushed up development costs and made them less competitive," Koike told Nikkei. "[Rapidus plans to] open up technology that should be standardized, bringing down costs, while handling important technology in-house." </p>

<p>Financially, Rapidus faces a significant challenge, needing a total of ¥5 trillion ($35 billion) by the time mass production starts in 2027. The company estimates that ¥2 trillion will be required by 2025 for prototype production. While the Japanese government has provided ¥920 billion in aid, Rapidus still needs to secure substantial funding from private investors.</p>

<p>Due to its lack of track record and experience of chip production as. well as limited visibility for success, Rapidus is finding it difficult to attract private financing. The company is in discussions with the government to make it easier to raise capital, including potential loan guarantees, and is hopeful that new legislation will assist in this effort.</p>
</p> Semiconductors

Rapidus Wants to Offer Fully Automated Packaging for 2nm Fab to Cut Chip Lead Times
One of the core challenges that Rapidus will face when it kicks off volume production of chips on its 2nm-class process technology in 2027 is lining up customers. With Intel, Samsung, and TSMC all slated to offer their own 2nm-class nodes by that time, Rapidus will need some kind of advantage to attract customers away from its more established rivals. To that end, the company thinks they've found their edge: fully automated packaging that will allow for shorter chip lead times than manned packaging operations.

In an interview with Nikkei, Rapidus' president, Atsuyoshi Koike, outlined the company's vision to use advanced packaging as a competitive edge for the new fab. The Hokkaido facility, which is currently under construction and is expecting to begin equipment installation this December, is already slated to both produce chips and offer advanced packaging services within the same facility, an industry first. But ultimately, Rapidus biggest plan to differentiate itself is by automating the back-end fab processes (chip packaging) to provide significantly faster turnaround times.

Rapidus is targetting back-end production in particular as, compared to front-end (lithography) production, back-end production still heavily relies on human labor. No other advanced packaging fab has fully automated the process thus far, which provides for a degree of flexibility, but slows throughput. But with automation in place to handle this aspect of chip production, Rapidus would be able to increase chip packaging efficiency and speed, which is crucial as chip assembly tasks become more complex. Rapidus is also collaborating with multiple Japanese suppliers to source materials for back-end production.

"In the past, Japanese chipmakers tried to keep their technology development exclusively in-house, which pushed up development costs and made them less competitive," Koike told Nikkei. "[Rapidus plans to] open up technology that should be standardized, bringing down costs, while handling important technology in-house."

Financially, Rapidus faces a significant challenge, needing a total of ¥5 trillion ($35 billion) by the time mass production starts in 2027. The company estimates that ¥2 trillion will be required by 2025 for prototype production. While the Japanese government has provided ¥920 billion in aid, Rapidus still needs to secure substantial funding from private investors.

Due to its lack of track record and experience of chip production as. well as limited visibility for success, Rapidus is finding it difficult to attract private financing. The company is in discussions with the government to make it easier to raise capital, including potential loan guarantees, and is hopeful that new legislation will assist in this effort.

Semiconductors

September 29, 2024

CXL Gathers Momentum at FMS 2024 <p align="center"><a href="https://www.anandtech.com/show/21533/cxl-gathers-momentum-at-fms-2024"><img src="https://images.anandtech.com/doci/21533/cxl-car-2_575px.jpg" alt="" /></a></p><p><p>The CXL consortium has had a regular presence at FMS (which rechristened itself from 'Flash Memory Summit' to the 'Future of Memory and Storage' this year). Back at FMS 2022, the company had <a href="https://www.anandtech.com/show/17520/compute-express-link-cxl-30-announced-doubled-speeds-and-flexible-fabrics">announced</a> v3.0 of the CXL specifications. This was followed by CXL 3.1's <a href="https://www.businesswire.com/news/home/20231114332690/en/CXL-Consortium-Announces-Compute-Express-Link-3.1-Specification-Release">introduction</a> at Supercomputing 2023. Having started off as a host to device interconnect standard, it had slowly <a href="https://www.anandtech.com/show/17519/">subsumed other competing standards</a> such as OpenCAPI and Gen-Z. As a result, the specifications started to encompass a wide variety of use-cases by building a protocol on top of the the ubiquitous PCIe expansion bus. The CXL consortium comprises of heavyweights such as AMD and Intel, as well as a large number of startup companies attempting to play in different segments on the device side. At FMS 2024, CXL had a prime position in the booth demos of many vendors.</p>

<p align="center"><a href="https://www.anandtech.com/show/21533/cxl-gathers-momentum-at-fms-2024"><img alt="" src="https://images.anandtech.com/doci/21533/cxl-mem-hier_575px.jpg" /></a></p>

<p>The migration of server platforms from DDR4 to DDR5, along with the rise of workloads demanding large RAM capacity (but not particularly sensitive to either memory bandwidth or latency), has opened up memory expansion modules as one of the first set of widely available CXL devices. Over the last couple of years, we have had product announcements from <a href="https://www.anandtech.com/show/21333">Samsung</a> and <a href="https://www.anandtech.com/show/20003">Micron</a> in this area.</p>

<h3>SK hynix CMM-DDR5 CXL Memory Module and HMSDK</h3>

<p>At FMS 2024, SK hynix was showing off their DDR5-based CMM-DDR5 CXL memory module with a 128 GB capacity. The company was also detailing their associated Heterogeneous Memory Software Development Kit (HMSDK) - a set of libraries and tools at both the kernel and user levels aimed at increasing the ease of use of CXL memory. This is achieved in part by considering the memory pyramid / hierarchy and relocating the data between the server's main memory (DRAM) and the CXL device based on usage frequency.</p>

<p align="center"><a href="https://www.anandtech.com/show/21533/cxl-gathers-momentum-at-fms-2024"><img alt="" src="https://images.anandtech.com/doci/21533/skh-cmm-ddr5_575px.jpg" /></a></p>

<p>The CMM-DDR5 CXL memory module comes in the SDFF form-factor (E3.S 2T) with a PCIe 3.0 x8 host interface. The internal memory is based on 1α technology DRAM, and the device promises DDR5-class bandwidth and latency within a single NUMA hop. As these memory modules are meant to be used in datacenters and enterprises, the firmware includes features for RAS (reliability, availability, and serviceability) along with secure boot and other management features.</p>

<p>SK hynix was also demonstrating Niagara 2.0 - a hardware solution (currently based on FPGAs) to enable memory pooling and sharing - i.e, connecting multiple CXL memories to allow different hosts (CPUs and GPUs) to optimally share their capacity. The previous version only allowed capacity sharing, but the latest version enables sharing of data also. SK hynix had <a href="https://news.skhynix.com/sk-hynix-presents-ai-memory-solutions-at-cxl-devcon-2024/">presented</a> these solutions at the CXL DevCon 2024 earlier this year, but some progress seems to have been made in finalizing the specifications of the CMM-DDR5 at FMS 2024.</p>

<h3>Microchip and Micron Demonstrate CZ120 CXL Memory Expansion Module</h3>

<p>Micron had <a href="https://www.anandtech.com/show/20003/">unveiled</a> the CZ120 CXL Memory Expansion Module last year based on the Microchip SMC 2000 series CXL memory controller. At FMS 2024, Micron and Microchip had a demonstration of the module on a Granite Rapids server.</p>

<p align="center"><a href="https://www.anandtech.com/show/21533/cxl-gathers-momentum-at-fms-2024"><img alt="" src="https://images.anandtech.com/doci/21533/mchip-micron_575px.jpg" /></a></p>

<p>Additional insights into the SMC 2000 controller were also provided.</p>

<p align="center"><a href="https://www.anandtech.com/show/21533/cxl-gathers-momentum-at-fms-2024"><img alt="" src="https://images.anandtech.com/doci/21533/mchip-sm2000_575px.png" /></a></p>

<p>The CXL memory controller also incorporates DRAM die failure handling, and Microchip also provides diagnostics and debug tools to analyze failed modules. The memory controller also supports ECC, which forms part of the enterprise... Storage

CXL Gathers Momentum at FMS 2024
The CXL consortium has had a regular presence at FMS (which rechristened itself from 'Flash Memory Summit' to the 'Future of Memory and Storage' this year). Back at FMS 2022, the company had announced v3.0 of the CXL specifications. This was followed by CXL 3.1's introduction at Supercomputing 2023. Having started off as a host to device interconnect standard, it had slowly subsumed other competing standards such as OpenCAPI and Gen-Z. As a result, the specifications started to encompass a wide variety of use-cases by building a protocol on top of the the ubiquitous PCIe expansion bus. The CXL consortium comprises of heavyweights such as AMD and Intel, as well as a large number of startup companies attempting to play in different segments on the device side. At FMS 2024, CXL had a prime position in the booth demos of many vendors.

The migration of server platforms from DDR4 to DDR5, along with the rise of workloads demanding large RAM capacity (but not particularly sensitive to either memory bandwidth or latency), has opened up memory expansion modules as one of the first set of widely available CXL devices. Over the last couple of years, we have had product announcements from Samsung and Micron in this area.

SK hynix CMM-DDR5 CXL Memory Module and HMSDK

At FMS 2024, SK hynix was showing off their DDR5-based CMM-DDR5 CXL memory module with a 128 GB capacity. The company was also detailing their associated Heterogeneous Memory Software Development Kit (HMSDK) - a set of libraries and tools at both the kernel and user levels aimed at increasing the ease of use of CXL memory. This is achieved in part by considering the memory pyramid / hierarchy and relocating the data between the server's main memory (DRAM) and the CXL device based on usage frequency.

The CMM-DDR5 CXL memory module comes in the SDFF form-factor (E3.S 2T) with a PCIe 3.0 x8 host interface. The internal memory is based on 1α technology DRAM, and the device promises DDR5-class bandwidth and latency within a single NUMA hop. As these memory modules are meant to be used in datacenters and enterprises, the firmware includes features for RAS (reliability, availability, and serviceability) along with secure boot and other management features.

SK hynix was also demonstrating Niagara 2.0 - a hardware solution (currently based on FPGAs) to enable memory pooling and sharing - i.e, connecting multiple CXL memories to allow different hosts (CPUs and GPUs) to optimally share their capacity. The previous version only allowed capacity sharing, but the latest version enables sharing of data also. SK hynix had presented these solutions at the CXL DevCon 2024 earlier this year, but some progress seems to have been made in finalizing the specifications of the CMM-DDR5 at FMS 2024.

Microchip and Micron Demonstrate CZ120 CXL Memory Expansion Module

Micron had unveiled the CZ120 CXL Memory Expansion Module last year based on the Microchip SMC 2000 series CXL memory controller. At FMS 2024, Micron and Microchip had a demonstration of the module on a Granite Rapids server.

Additional insights into the SMC 2000 controller were also provided.

The CXL memory controller also incorporates DRAM die failure handling, and Microchip also provides diagnostics and debug tools to analyze failed modules. The memory controller also supports ECC, which forms part of the enterprise... Storage

November 19, 2024