At FMS 2024, the technological requirements from the storage and memory subsystem took center stage. Both SSD and controller vendors had various demonstrations touting their suitability for different stages of the AI data pipeline - ingestion, preparation, training, checkpointing, and inference. Vendors like Solidigm have different types of SSDs optimized for different stages of the pipeline. At the same time, controller vendors have taken advantage of one of the features introduced recently in the NVM Express standard - Flexible Data Placement (FDP).
FDP involves the host providing information / hints about the areas where the controller could place the incoming write data in order to reduce the write amplification. These hints are generated based on specific block sizes advertised by the device. The feature is completely backwards-compatible, with non-FDP hosts working just as before with FDP-enabled SSDs, and vice-versa.
Silicon Motion's MonTitan Gen 5 Enterprise SSD Platform was announced back in 2022. Since then, Silicon Motion has been touting the flexibility of the platform, allowing its customers to incorporate their own features as part of the customization process. This approach is common in the enterprise space, as we have seen with Marvell's Bravera SC5 SSD controller in the DapuStor SSDs and Microchip's Flashtec controllers in the Longsys FORESEE enterprise SSDs.
At FMS 2024, the company was demonstrating the advantages of flexible data placement by allowing a single QLC SSD based on their MonTitan platform to take part in different stages of the AI data pipeline while maintaining the required quality of service (minimum bandwidth) for each process. The company even has a trademarked name (PerformaShape) for the firmware feature in the controller that allows the isolation of different concurrent SSD accesses (from different stages in the AI data pipeline) to guarantee this QoS. Silicon Motion claims that this scheme will enable its customers to get the maximum write performance possible from QLC SSDs without negatively impacting the performance of other types of accesses.
Silicon Motion and Phison have market leadership in the client SSD controller market with similar approaches. However, their enterprise SSD controller marketing couldn't be more different. While Phison has gone in for a turnkey solution with their Gen 5 SSD platform (to the extent of not adopting the white label route for this generation, and instead opting to get the SSDs qualified with different cloud service providers themselves), Silicon Motion is opting for a different approach. The flexibility and customization possibilities can make platforms like the MonTitan appeal to flash array vendors.
StoragePCI-SIG Demonstrates PCIe 6.0 Interoperability at FMS 2024 As the deployment of PCIe 5.0 picks up steam in both datacenter and consumer markets, PCI-SIG is not sitting idle, and is already working on getting the ecosystem ready for the updats to the PCIe specifications. At FMS 2024, some vendors were even talking about PCIe 7.0 with its 128 GT/s capabilities despite PCIe 6.0 not even starting to ship yet. We caught up with PCI-SIG to get some updates on its activities and have a discussion on the current state of the PCIe ecosystem. PCI-SIG has already made the PCIe 7.0 specifications (v 0.5) available to its members, and expects full specifications to be officially released sometime in 2025. The goal is to deliver a 128 GT/s data rate with up to 512 GBps of bidirectional traffic using x16 links. Similar to PCIe 6.0, this specification will also utilize PAM4 signaling and maintain backwards compatibility. Power efficiency as well as silicon die area are also being kept in mind as part of the drafting process. The move to PAM4 signaling brings higher bit-error rates compared to the previous NRZ scheme. This made it necessary to adopt a different error correction scheme in PCIe 6.0 - instead of operating on variable length packets, PCIe 6.0's Flow Control Unit (FLIT) encoding operates on fixed size packets to aid in forward error correction. PCIe 7.0 retains these aspects. The integrators list for the PCIe 6.0 compliance program is also expected to come out in 2025, though initial testing is already in progress. This was evident by the FMS 2024 demo involving Cadence's 3nm test chip for its PCIe 6.0 IP offering along with Teledyne Lecroy's PCIe 6.0 analyzer. These timelines track well with the specification completion dates and compliance program availability for previous PCIe generations. We also received an update on the optical workgroup - while being optical-technology agnostic, the WG also intends to develop technology-specific form-factors including pluggable optical transceivers, on-board optics, co-packaged optics, and optical I/O. The logical and electrical layers of the PCIe 6.0 specifications are being enhanced to accommodate the new optical PCIe standardization and this process will also be done with PCIe 7.0 to coincide with that standard's release next year. The PCI-SIG also has ongoing cabling initiatives. On the consumer side, we have seen significant traction for Thunderbolt and external GPU enclosures. However, even datacenters and enterprise systems are moving towards cabling solutions as it becomes evident that disaggregation of components such as storage from the CPU and GPU are better for thermal design. Additionally maintaining signal integrity over longer distances becomes difficult for on-board signal traces. Cabling internal to the computing systems can help here. OCuLink emerged as a good candidate and was adopted fairly widely as an internal link in server systems. It has even made an appearance in mini-PCs from some Chinese manufacturers in its external avatar for the consumer market, albeit with limited traction. As speeds increase, a widely-adopted standard for external PCIe peripherals (or even connecting components within a system) will become imperative. Storage
Kioxia Demonstrates Optical Interface SSDs for Data Centers A few years back, the Japanese government's New Energy and Industrial Technology Development Organization (NEDO ) allocated funding for the development of green datacenter technologies. With the aim to obtain up to 40% savings in overall power consumption, several Japanese companies have been developing an optical interface for their enterprise SSDs. And at this year's FMS, Kioxia had their optical interface on display. For this demonstration, Kioxia took its existing CM7 enterprise SSD and created an optical interface for it. A PCIe card with on-board optics developed by Kyocera is installed in the server slot. An optical interface allows data transfer over long distances (it was 40m in the demo, but Kioxia promises lengths of up to 100m for the cable in the future). This allows the storage to be kept in a separate room with minimal cooling requirements compared to the rack with the CPUs and GPUs. Disaggregation of different server components will become an option as very high throughput interfaces such as PCIe 7.0 (with 128 GT/s rates) become available. The demonstration of the optical SSD showed a slight loss in IOPS performance, but a significant advantage in the latency metric over the shipping enterprise SSD behind a copper network link. Obviously, there are advantages in wiring requirements and signal integrity maintenance with optical links. Being a proof-of-concept demonstration, we do see the requirement for an industry-standard approach if this were to gain adoption among different datacenter vendors. The PCI-SIG optical workgroup will need to get its act together soon to create a standards-based approach to this problem. Storage



At FMS 2024, the technological requirements from the storage and memory subsystem took center stage. Both SSD and controller vendors had various demonstrations touting their suitability for different stages of the AI data pipeline - ingestion, preparation, training, checkpointing, and inference. Vendors like Solidigm have different types of SSDs optimized for different stages of the pipeline. At the same time, controller vendors have taken advantage of one of the features introduced recently in the NVM Express standard - Flexible Data Placement (FDP).
FDP involves the host providing information / hints about the areas where the controller could place the incoming write data in order to reduce the write amplification. These hints are generated based on specific block sizes advertised by the device. The feature is completely backwards-compatible, with non-FDP hosts working just as before with FDP-enabled SSDs, and vice-versa.
Silicon Motion's MonTitan Gen 5 Enterprise SSD Platform was announced back in 2022. Since then, Silicon Motion has been touting the flexibility of the platform, allowing its customers to incorporate their own features as part of the customization process. This approach is common in the enterprise space, as we have seen with Marvell's Bravera SC5 SSD controller in the DapuStor SSDs and Microchip's Flashtec controllers in the Longsys FORESEE enterprise SSDs.
At FMS 2024, the company was demonstrating the advantages of flexible data placement by allowing a single QLC SSD based on their MonTitan platform to take part in different stages of the AI data pipeline while maintaining the required quality of service (minimum bandwidth) for each process. The company even has a trademarked name (PerformaShape) for the firmware feature in the controller that allows the isolation of different concurrent SSD accesses (from different stages in the AI data pipeline) to guarantee this QoS. Silicon Motion claims that this scheme will enable its customers to get the maximum write performance possible from QLC SSDs without negatively impacting the performance of other types of accesses.
Silicon Motion and Phison have market leadership in the client SSD controller market with similar approaches. However, their enterprise SSD controller marketing couldn't be more different. While Phison has gone in for a turnkey solution with their Gen 5 SSD platform (to the extent of not adopting the white label route for this generation, and instead opting to get the SSDs qualified with different cloud service providers themselves), Silicon Motion is opting for a different approach. The flexibility and customization possibilities can make platforms like the MonTitan appeal to flash array vendors.
StorageKioxia's booth at FMS 2024 was a busy one with multiple technology demonstrations keeping visitors occupied. A walk-through of the BiCS 8 manufacturing process was the first to grab my attention. Kioxia and Western Digital announced the sampling of BiCS 8 in March 2023. We had touched briefly upon its CMOS Bonded Array (CBA) scheme in our coverage of Kioxial's 2Tb QLC NAND device and coverage of Western Digital's 128 TB QLC enterprise SSD proof-of-concept demonstration. At Kioxia's booth, we got more insights.
Traditionally, fabrication of flash chips involved placement of the associate logic circuitry (CMOS process) around the periphery of the flash array. The process then moved on to putting the CMOS under the cell array, but the wafer development process was serialized with the CMOS logic getting fabricated first followed by the cell array on top. However, this has some challenges because the cell array requires a high-temperature processing step to ensure higher reliability that can be detrimental to the health of the CMOS logic. Thanks to recent advancements in wafer bonding techniques, the new CBA process allows the CMOS wafer and cell array wafer to be processed independently in parallel and then pieced together, as shown in the models above.
The BiCS 8 3D NAND incorporates 218 layers, compared to 112 layers in BiCS 5 and 162 layers in BiCS 6. The company decided to skip over BiCS 7 (or, rather, it was probably a short-lived generation meant as an internal test vehicle). The generation retains the four-plane charge trap structure of BiCS 6. In its TLC avatar, it is available as a 1 Tbit device. The QLC version is available in two capacities - 1 Tbit and 2 Tbit.
Kioxia also noted that while the number of layers (218) doesn't compare favorably with the latest layer counts from the competition, its lateral scaling / cell shrinkage has enabled it to be competitive in terms of bit density as well as operating speeds (3200 MT/s). For reference, the latest shipping NAND from Micron - the G9 - has 276 layers with a bit density in TLC mode of 21 Gbit/mm2, and operates at up to 3600 MT/s. However, its 232L NAND operates only up to 2400 MT/s and has a bit density of 14.6 Gbit/mm2.
It must be noted that the CBA hybrid bonding process has advantages over the current processes used by other vendors - including Micron's CMOS under array (CuA) and SK hynix's 4D PUC (periphery-under-chip) developed in the late 2010s. It is expected that other NAND vendors will also move eventually to some variant of the hybrid bonding scheme used by Kioxia.
StorageWhen Western Digital introduced its Ultrastar DC SN861 SSDs earlier this year, the company did not disclose which controller it used for these drives, which made many observers presume that WD was using an in-house controller. But a recent teardown of the drive shows that is not the case; instead, the company is using a controller from Fadu, a South Korean company founded in 2015 that specializes on enterprise-grade turnkey SSD solutions.
The Western Digital Ultrastar DC SN861 SSD is aimed at performance-hungry hyperscale datacenters and enterprise customers which are adopting PCIe Gen5 storage devices these days. And, as uncovered in photos from a recent Storage Review article, the drive is based on Fadu's FC5161 NVMe 2.0-compliant controller. The FC5161 utilizes 16 NAND channels supporting an ONFi 5.0 2400 MT/s interface, and features a combination of enterprise-grade capabilities (OCP Cloud Spec 2.0, SR-IOV, up to 512 name spaces for ZNS support, flexible data placement, NVMe-MI 1.2, advanced security, telemetry, power loss protection) not available on other off-the-shelf controllers – or on any previous Western Digital controllers.
The Ultrastar DC SN861 SSD offers sequential read speeds up to 13.7 GB/s as well as sequential write speeds up to 7.5 GB/s. As for random performance, it boasts with an up to 3.3 million random 4K read IOPS and up to 0.8 million random 4K write IOPS. The drives are available in capacities between 1.6 TB and 7.68 TB with one or three drive writes per day (DWPD) over five years rating as well as in U.2 and E1.S form-factors.
While the two form factors of the SN861 share a similar technical design, Western Digital has tailored each version for distinct workloads: the E1.S supports FDP and performance enhancements specifically for cloud environments. By contrast, the U.2 model is geared towards high-performance enterprise tasks and emerging applications like AI.
Without any doubts, Western Digital's Ultrastar DC SN861 is a feature-rich high-performance enterprise-grade SSD. It has another distinctive feature: a 5W idle power consumption, which is rather low by the standards of enterprise-grade drives (e.g., it is 1W lower compared to the SN840). While the difference with predecessors may be just 1W, hyperscalers deploy thousands of drives and for their TCO every watt counts.
Western Digital's Ultrastar DC SN861 SSDs are now available for purchase to select customers (such as Meta) and to interested parties. Prices are unknown, but they will depend on such factors as volumes.
Sources: Fadu, Storage Review
StorageWhile neuromorphic computing remains under research for the time being, efforts into the field have continued to grow over the years, as have the capabilities of the specialty chips that have been developed for this research. Following those lines, this morning Intel and Sandia National Laboratories are celebrating the deployment of the Hala Point neuromorphic system, which the two believe is the highest capacity system in the world. With 1.15 billion neurons overall, Hala Point is the largest deployment yet for Intel’s Loihi 2 neuromorphic chip, which was first announced at the tail-end of 2021.
The Hala Point system incorporates 1152 Loihi 2 processors, each of which is capable of simulating a million neurons. As noted back at the time of Loihi 2’s launch, these chips are actually rather small – just 31 mm2 per chip with 2.3 billion transistors each, as they’re built on the Intel 4 process (one of the only other Intel chips to do so, besides Meteor Lake). As a result, the complete system is similarly petite, taking up just 6 rack units of space (or as Sandia likes to compare it to, about the size of a microwave), with a power consumption of 2.6 kW. Now that it’s online, Hala Point has dethroned the SpiNNaker system as the largest disclosed neuromorphic system, offering admittedly just a slightly larger number of neurons at less than 3% of the power consumption of the 100 kW British system.

A Single Loihi 2 Chip (31 mm2)
Hala Point will be replacing an older Intel neuromorphic system at Sandia, Pohoiki Springs, which is based on Intel’s first-generation Loihi chips. By comparison, Hala Point offers ten-times as many neurons, and upwards of 12x the performance overall,
Both neuromorphic systems have been procured by Sandia in order to advance the national lab’s research into neuromorphic computing, a computing paradigm that behaves like a brain. The central thought (if you’ll excuse the pun) is that by mimicking the wetware writing this article, neuromorphic chips can be used to solve problems that conventional processors cannot solve today, and that they can do so more efficiently as well.
Sandia, for its part, has said that it will be using the system to look at large-scale neuromorphic computing, with work operating on a scale well beyond Pohoiki Springs. With Hala Point offering a simulated neuron count very roughly on the level of complexity of an owl brain, the lab believes that a larger-scale system will finally enable them to properly exploit the properties of neuromorphic computing to solve real problems in fields such as device physics, computer architecture, computer science and informatics, moving well beyond the simple demonstrations initially achieved at a smaller scale.
One new focus from the lab, which in turn has caught Intel’s attention, is the applicability of neuromorphic computing towards AI inference. Because the neural networks themselves behind the current wave of AI systems are attempting to emulate the human brain, in a sense, there is an obvious degree of synergy with the brain-mimicking neuromorphic chips, even if the algorithms differ in some key respects. Still, with energy efficiency being one of the major benefits of neuromorphic computing, it’s pushed Intel to look into the matter further – and even build a second, Hala Point-sized system of their own.
According to Intel, in their research on Hala Point, the system has reached efficiencies as high as 15 TOPS-per-Watt at 8-bit precision, albeit while using 10:1 sparsity, making it more than competitive with current-generation commercial chips. As an added bonus to that efficiency, the neuromorphic systems don’t require extensive data processing and batching in advance, which is normally necessary to make efficient use of the high density ALU arrays in GPUs and GPU-like processors.
Perhaps the most interesting use case of all, however, is the potent... CPUs
At FMS 2024, the technological requirements from the storage and memory subsystem took center stage. Both SSD and controller vendors had various demonstrations touting their suitability for different stages of the AI data pipeline - ingestion, preparation, training, checkpointing, and inference. Vendors like Solidigm have different types of SSDs optimized for different stages of the pipeline. At the same time, controller vendors have taken advantage of one of the features introduced recently in the NVM Express standard - Flexible Data Placement (FDP).
FDP involves the host providing information / hints about the areas where the controller could place the incoming write data in order to reduce the write amplification. These hints are generated based on specific block sizes advertised by the device. The feature is completely backwards-compatible, with non-FDP hosts working just as before with FDP-enabled SSDs, and vice-versa.
Silicon Motion's MonTitan Gen 5 Enterprise SSD Platform was announced back in 2022. Since then, Silicon Motion has been touting the flexibility of the platform, allowing its customers to incorporate their own features as part of the customization process. This approach is common in the enterprise space, as we have seen with Marvell's Bravera SC5 SSD controller in the DapuStor SSDs and Microchip's Flashtec controllers in the Longsys FORESEE enterprise SSDs.
At FMS 2024, the company was demonstrating the advantages of flexible data placement by allowing a single QLC SSD based on their MonTitan platform to take part in different stages of the AI data pipeline while maintaining the required quality of service (minimum bandwidth) for each process. The company even has a trademarked name (PerformaShape) for the firmware feature in the controller that allows the isolation of different concurrent SSD accesses (from different stages in the AI data pipeline) to guarantee this QoS. Silicon Motion claims that this scheme will enable its customers to get the maximum write performance possible from QLC SSDs without negatively impacting the performance of other types of accesses.
Silicon Motion and Phison have market leadership in the client SSD controller market with similar approaches. However, their enterprise SSD controller marketing couldn't be more different. While Phison has gone in for a turnkey solution with their Gen 5 SSD platform (to the extent of not adopting the white label route for this generation, and instead opting to get the SSDs qualified with different cloud service providers themselves), Silicon Motion is opting for a different approach. The flexibility and customization possibilities can make platforms like the MonTitan appeal to flash array vendors.
StorageKioxia's booth at FMS 2024 was a busy one with multiple technology demonstrations keeping visitors occupied. A walk-through of the BiCS 8 manufacturing process was the first to grab my attention. Kioxia and Western Digital announced the sampling of BiCS 8 in March 2023. We had touched briefly upon its CMOS Bonded Array (CBA) scheme in our coverage of Kioxial's 2Tb QLC NAND device and coverage of Western Digital's 128 TB QLC enterprise SSD proof-of-concept demonstration. At Kioxia's booth, we got more insights.
Traditionally, fabrication of flash chips involved placement of the associate logic circuitry (CMOS process) around the periphery of the flash array. The process then moved on to putting the CMOS under the cell array, but the wafer development process was serialized with the CMOS logic getting fabricated first followed by the cell array on top. However, this has some challenges because the cell array requires a high-temperature processing step to ensure higher reliability that can be detrimental to the health of the CMOS logic. Thanks to recent advancements in wafer bonding techniques, the new CBA process allows the CMOS wafer and cell array wafer to be processed independently in parallel and then pieced together, as shown in the models above.
The BiCS 8 3D NAND incorporates 218 layers, compared to 112 layers in BiCS 5 and 162 layers in BiCS 6. The company decided to skip over BiCS 7 (or, rather, it was probably a short-lived generation meant as an internal test vehicle). The generation retains the four-plane charge trap structure of BiCS 6. In its TLC avatar, it is available as a 1 Tbit device. The QLC version is available in two capacities - 1 Tbit and 2 Tbit.
Kioxia also noted that while the number of layers (218) doesn't compare favorably with the latest layer counts from the competition, its lateral scaling / cell shrinkage has enabled it to be competitive in terms of bit density as well as operating speeds (3200 MT/s). For reference, the latest shipping NAND from Micron - the G9 - has 276 layers with a bit density in TLC mode of 21 Gbit/mm2, and operates at up to 3600 MT/s. However, its 232L NAND operates only up to 2400 MT/s and has a bit density of 14.6 Gbit/mm2.
It must be noted that the CBA hybrid bonding process has advantages over the current processes used by other vendors - including Micron's CMOS under array (CuA) and SK hynix's 4D PUC (periphery-under-chip) developed in the late 2010s. It is expected that other NAND vendors will also move eventually to some variant of the hybrid bonding scheme used by Kioxia.
StorageWhen Western Digital introduced its Ultrastar DC SN861 SSDs earlier this year, the company did not disclose which controller it used for these drives, which made many observers presume that WD was using an in-house controller. But a recent teardown of the drive shows that is not the case; instead, the company is using a controller from Fadu, a South Korean company founded in 2015 that specializes on enterprise-grade turnkey SSD solutions.
The Western Digital Ultrastar DC SN861 SSD is aimed at performance-hungry hyperscale datacenters and enterprise customers which are adopting PCIe Gen5 storage devices these days. And, as uncovered in photos from a recent Storage Review article, the drive is based on Fadu's FC5161 NVMe 2.0-compliant controller. The FC5161 utilizes 16 NAND channels supporting an ONFi 5.0 2400 MT/s interface, and features a combination of enterprise-grade capabilities (OCP Cloud Spec 2.0, SR-IOV, up to 512 name spaces for ZNS support, flexible data placement, NVMe-MI 1.2, advanced security, telemetry, power loss protection) not available on other off-the-shelf controllers – or on any previous Western Digital controllers.
The Ultrastar DC SN861 SSD offers sequential read speeds up to 13.7 GB/s as well as sequential write speeds up to 7.5 GB/s. As for random performance, it boasts with an up to 3.3 million random 4K read IOPS and up to 0.8 million random 4K write IOPS. The drives are available in capacities between 1.6 TB and 7.68 TB with one or three drive writes per day (DWPD) over five years rating as well as in U.2 and E1.S form-factors.
While the two form factors of the SN861 share a similar technical design, Western Digital has tailored each version for distinct workloads: the E1.S supports FDP and performance enhancements specifically for cloud environments. By contrast, the U.2 model is geared towards high-performance enterprise tasks and emerging applications like AI.
Without any doubts, Western Digital's Ultrastar DC SN861 is a feature-rich high-performance enterprise-grade SSD. It has another distinctive feature: a 5W idle power consumption, which is rather low by the standards of enterprise-grade drives (e.g., it is 1W lower compared to the SN840). While the difference with predecessors may be just 1W, hyperscalers deploy thousands of drives and for their TCO every watt counts.
Western Digital's Ultrastar DC SN861 SSDs are now available for purchase to select customers (such as Meta) and to interested parties. Prices are unknown, but they will depend on such factors as volumes.
Sources: Fadu, Storage Review
StorageWhile neuromorphic computing remains under research for the time being, efforts into the field have continued to grow over the years, as have the capabilities of the specialty chips that have been developed for this research. Following those lines, this morning Intel and Sandia National Laboratories are celebrating the deployment of the Hala Point neuromorphic system, which the two believe is the highest capacity system in the world. With 1.15 billion neurons overall, Hala Point is the largest deployment yet for Intel’s Loihi 2 neuromorphic chip, which was first announced at the tail-end of 2021.
The Hala Point system incorporates 1152 Loihi 2 processors, each of which is capable of simulating a million neurons. As noted back at the time of Loihi 2’s launch, these chips are actually rather small – just 31 mm2 per chip with 2.3 billion transistors each, as they’re built on the Intel 4 process (one of the only other Intel chips to do so, besides Meteor Lake). As a result, the complete system is similarly petite, taking up just 6 rack units of space (or as Sandia likes to compare it to, about the size of a microwave), with a power consumption of 2.6 kW. Now that it’s online, Hala Point has dethroned the SpiNNaker system as the largest disclosed neuromorphic system, offering admittedly just a slightly larger number of neurons at less than 3% of the power consumption of the 100 kW British system.

A Single Loihi 2 Chip (31 mm2)
Hala Point will be replacing an older Intel neuromorphic system at Sandia, Pohoiki Springs, which is based on Intel’s first-generation Loihi chips. By comparison, Hala Point offers ten-times as many neurons, and upwards of 12x the performance overall,
Both neuromorphic systems have been procured by Sandia in order to advance the national lab’s research into neuromorphic computing, a computing paradigm that behaves like a brain. The central thought (if you’ll excuse the pun) is that by mimicking the wetware writing this article, neuromorphic chips can be used to solve problems that conventional processors cannot solve today, and that they can do so more efficiently as well.
Sandia, for its part, has said that it will be using the system to look at large-scale neuromorphic computing, with work operating on a scale well beyond Pohoiki Springs. With Hala Point offering a simulated neuron count very roughly on the level of complexity of an owl brain, the lab believes that a larger-scale system will finally enable them to properly exploit the properties of neuromorphic computing to solve real problems in fields such as device physics, computer architecture, computer science and informatics, moving well beyond the simple demonstrations initially achieved at a smaller scale.
One new focus from the lab, which in turn has caught Intel’s attention, is the applicability of neuromorphic computing towards AI inference. Because the neural networks themselves behind the current wave of AI systems are attempting to emulate the human brain, in a sense, there is an obvious degree of synergy with the brain-mimicking neuromorphic chips, even if the algorithms differ in some key respects. Still, with energy efficiency being one of the major benefits of neuromorphic computing, it’s pushed Intel to look into the matter further – and even build a second, Hala Point-sized system of their own.
According to Intel, in their research on Hala Point, the system has reached efficiencies as high as 15 TOPS-per-Watt at 8-bit precision, albeit while using 10:1 sparsity, making it more than competitive with current-generation commercial chips. As an added bonus to that efficiency, the neuromorphic systems don’t require extensive data processing and batching in advance, which is normally necessary to make efficient use of the high density ALU arrays in GPUs and GPU-like processors.
Perhaps the most interesting use case of all, however, is the potent... CPUs
With the arrival of spring comes showers, flowers, and in the technology industry, TSMC's annual technology symposium series. With customers spread all around the world, the Taiwanese pure play foundry has adopted an interesting strategy for updating its customers on its fab plans, holding a series of symposiums from Silicon Valley to Shanghai. Kicking off the series every year – and giving us our first real look at TSMC's updated foundry plans for the coming years – is the Santa Clara stop, where yesterday the company has detailed several new technologies, ranging from more advanced lithography processes to massive, wafer-scale chip packing options.
Today we're publishing several stories based on TSMC's different offerings, starting with TSMC's marquee announcement: their A16 process node. Meanwhile, for the rest of our symposium stories, please be sure to check out the related reading below, and check back for additional stories.
Headlining its Silicon Valley stop, TSMC announced its first 'angstrom-class' process technology: A16. Following a production schedule shift that has seen backside power delivery network technology (BSPDN) removed from TSMC's N2P node, the new 1.6nm-class production node will now be the first process to introduce BSPDN to TSMC's chipmaking repertoire. With the addition of backside power capabilities and other improvements, TSMC expects A16 to offer significantly improved performance and energy efficiency compared to TSMC's N2P fabrication process. It will be available to TSMC's clients starting H2 2026.
At a high level, TSMC's A16 process technology will rely on gate-all-around (GAAFET) nanosheet transistors and will feature a backside power rail, which will both improve power delivery and moderately increase transistor density. Compared to TSMC's N2P fabrication process, A16 is expected to offer a performance improvement of 8% to 10% at the same voltage and complexity, or a 15% to 20% reduction in power consumption at the same frequency and transistor count. TSMC is not listing detailed density parameters this far out, but the company says that chip density will increase by 1.07x to 1.10x – keeping in mind that transistor density heavily depends on the type and libraries of transistors used.
The key innovation of TSMC's A16 node, is its Super Power Rail (SPR) backside power delivery network, a first for TSMC. The contract chipmaker claims that A16's SPR is specifically tailored for high-performance computing products that feature both complex signal routes and dense power circuitry.
As noted earlier, with this week's announcement, A16 has now become the launch vehicle for backside power delivery at TSMC. The company was initially slated to offer BSPDN technology with N2P in 2026, but for reasons that aren't entirely clear, the tech has been punted from N2P and moved to A16. TSMC's official timing for N2P in 2023 was always a bit loose, so it's hard to say if this represents much of a practical delay for BSPDN at TSMC. But at the same time, it's important to underscore that A16 isn't just N2P renamed, but rather it will be a distinct technology from N2P.
TSMC is not the only fab pursuing backside power delivery, and accordingly, we're seeing multiple variations on the technique crop up at different fabs. The... Semiconductors
UPDATE 6/12: Micron notified us that it expects its HBM market share to rise to mid-20% in the middle of calendar 2025, not in the middle of fiscal 2025.
For Computex week, Micron was at the show in force in order to talk about its latest products across the memory spectrum. The biggest news for the memory company was that it has kicked-off sampling of it's next-gen GDDR7 memory, which is expected to start showing up in finished products later this year and was being demoed on the show floor. Meanwhile, the company is also eyeing taking a much larger piece of the other pillar of the high-performance memory market – High Bandwidth Memory – with aims of capturing around 25% of the premium HBM market.
Micron's first GDDR7 chip is a 16 Gb memory device with a 32 GT/sec (32Gbps/pin) transfer rate, which is significantly faster than contemporary GDDR6/GDDR6X. As outlined with JEDEC's announcement of GDDR7 earlier this year, the latest iteration of the high-performance memory technology is slated to improve on both memory bandwidth and capacity, with bandwidths starting at 32 GT/sec and potentially climbing another 50% higher to 48 GT/sec by the time the technology reaches its apex. And while the first chips are starting off at the same 2GByte (16Gbit) capacity as today's GDDR6(X) chips, the standard itself defines capacities as high as 64Gbit.
Of particular note, GDDR7 brings with it the switch to PAM3 (3-state) signal encoding, moving from the industry's long-held NRZ (2-state) signaling. As Micron was responsible for the bespoke GDDR6X technology, which was the first major DRAM spec to use PAM signaling (in its case, 4-state PAM4), Micron reckons they have a leg-up with GDDR7 development, as they're already familiar with working with PAM.
The GDDR7 transition also brings with it a change in how chips are organized, with the standard 32-bit wide chip now split up into four 8-bit sub-channels. And, like most other contemporary memory standards, GDDR7 is adding on-die ECC support to hold the line on chip reliability (though as always, we should note that on-die ECC isn't meant to be a replacement for full, multi-chip ECC). The standard also implements some other RAS features such as error checking and scrubbing, which although are not germane to gaming, will be a big deal for compute/AI use cases.
The added complexity of GDDR7 means that the pin count is once again increasing as well, with the new standard adding a further 86 pins to accommodate the data transfer and power delivery changes, bringing it to a total of 266 pins. With that said, the actual package size is remaining unchanged from GDDR5/GDDR6, maintaining that familiar 14mm x 12mm package. Memory manufacturers are instead using smaller diameter balls, as well as decreasing the pitch between the individual solder balls – going from GDDR6's 0.75mm x 0.75mm pitch to a slightly shorter 0.75mm x 0.73mm pitch. This allows the same package to fit in another 5 rows of contacts.
As for Micron's own production plans, the company is using its latest 1-beta (1β) fabrication process. While the major memory manufacturers don't readily publish the physical parameters of their processes these days, Micron believes that they have the edge on density with 1β, and consequently will be producing the densest GDDR7 at launch. And, while more nebulous, the company company believes that 1β will give them an edge in power efficiency as well.
Micron says that the first devices incorporating GDDR7 will be available this year. And while video card vendors remain a major consumer of GDDR memory, in 2024 the AI accelerator market should not be overlooked. With AI accelerators still bottlenecked by memory capacity and bandwidth, GDDR7 is expected to pair very well with inference accelerators, which need a more cost-effective option than HBM.
On Tuesday, Noctua introduced its second-generation NH-D15 cooler, which offers refined performance and formally supports Intel's next-generation Arrow Lake-S processors in LGA1851 packaging. Alongside its NH-D15 G2 CPU cooler, Noctua also introduced its NF-A14x25r G2 140mm fans.
The Noctua NH-D15 G2 is an enhanced version of the popular NH-D15 cooler with eight heat pipes, two asymmetrical fin-stack and two speed-offset 140-mm PWM fans (to avoid acoustic interaction phenomena such as periodic humming or intermittent vibrations). According to the manufacturer, these key components are tailored to work efficiently together to deliver superior quiet cooling performance, rivalling many all-in-one water cooling systems and pushing the boundaries of air cooling efficiency.
Noctua offers the NH-D15 G2 in three versions to address the specific requirements of modern CPUs. The regular version is versatile and can be used for AMD's AM5 processors and Intel's LGA1700 CPUs with included mounting accessories. The HBC (High Base Convexity) variant is tailored for LGA1700 processors, especially those subjected to full ILM pressure or those that have deformed over time, ensuring excellent contact quality despite the concave shape of the CPU. Finally, the LBC (Low Base Convexity) version is tailored for flat rectangular CPUs, providing optimal contact on AMD's AM5 and other similar processors.
While there are three versions of NH-D15 G2 aimed at different processors, they are all said to be compatible with a wide range of motherboards and other hardware. The new coolers' offset construction ensures clearance for the top PCIe x16 slot on most current motherboards. Additionally, they feature the upgraded Torx-based SecuFirm2+ multi-socket mounting system and come with Noctua's NT-H2 thermal compound.
For those looking to upgrade existing coolers like the NH-D15, NH-D15S, or NH-U14S series, Noctua is also releasing the NF-A14x25r G2 fans separately. These round-frame fans are fine-tuned in single and dual fan packages to minimize noise levels while offering decent cooling performance.
Finally, Noctua is also prepping a square-frame version of the NF-A14x25 G2 fan for release in September. This variant targets water-cooling radiators and case-cooling applications and promises to extend the versatility of Noctua's cooling solutions further.
All versions of Noctua's NH-D15 G2 coolers cost $149.90/€149.90. One NF-A14x25 G2 fan costs $39.90/€39.90, whereas a package of two fans costs $79.80/€79.80. The cooler is backed with a six-year warranty.
Cases/Cooling/PSUs' class='post-thumb' src='https://lh3.googleusercontent.com/blogger_img_proxy/AEn0k_s4gsefwJB7i5zGFveagJi1DUGCjufuE9AY6SgZrIRUOXFvwG8wl1P_adwH8SJzE-VnghvDvaNIzoJKsFY6kIapigOr5ewi7l29_UzRt8ohrFWSKQGaXNG8ue00dU3vudSjntFDBQ=w72-h72-p-k-no-nu'/>On Tuesday, Noctua introduced its second-generation NH-D15 cooler, which offers refined performance and formally supports Intel's next-generation Arrow Lake-S processors in LGA1851 packaging. Alongside its NH-D15 G2 CPU cooler, Noctua also introduced its NF-A14x25r G2 140mm fans.
The Noctua NH-D15 G2 is an enhanced version of the popular NH-D15 cooler with eight heat pipes, two asymmetrical fin-stack and two speed-offset 140-mm PWM fans (to avoid acoustic interaction phenomena such as periodic humming or intermittent vibrations). According to the manufacturer, these key components are tailored to work efficiently together to deliver superior quiet cooling performance, rivalling many all-in-one water cooling systems and pushing the boundaries of air cooling efficiency.
Noctua offers the NH-D15 G2 in three versions to address the specific requirements of modern CPUs. The regular version is versatile and can be used for AMD's AM5 processors and Intel's LGA1700 CPUs with included mounting accessories. The HBC (High Base Convexity) variant is tailored for LGA1700 processors, especially those subjected to full ILM pressure or those that have deformed over time, ensuring excellent contact quality despite the concave shape of the CPU. Finally, the LBC (Low Base Convexity) version is tailored for flat rectangular CPUs, providing optimal contact on AMD's AM5 and other similar processors.
While there are three versions of NH-D15 G2 aimed at different processors, they are all said to be compatible with a wide range of motherboards and other hardware. The new coolers' offset construction ensures clearance for the top PCIe x16 slot on most current motherboards. Additionally, they feature the upgraded Torx-based SecuFirm2+ multi-socket mounting system and come with Noctua's NT-H2 thermal compound.
For those looking to upgrade existing coolers like the NH-D15, NH-D15S, or NH-U14S series, Noctua is also releasing the NF-A14x25r G2 fans separately. These round-frame fans are fine-tuned in single and dual fan packages to minimize noise levels while offering decent cooling performance.
Finally, Noctua is also prepping a square-frame version of the NF-A14x25 G2 fan for release in September. This variant targets water-cooling radiators and case-cooling applications and promises to extend the versatility of Noctua's cooling solutions further.
All versions of Noctua's NH-D15 G2 coolers cost $149.90/€149.90. One NF-A14x25 G2 fan costs $39.90/€39.90, whereas a package of two fans costs $79.80/€79.80. The cooler is backed with a six-year warranty.
Cases/Cooling/PSUsAt FMS 2024, Kioxia had a proof-of-concept demonstration of their proposed a new RAID offload methodology for enterprise SSDs. The impetus for this is quite clear: as SSDs get faster in each generation, RAID arrays have a major problem of maintaining (and scaling up) performance. Even in cases where the RAID operations are handled by a dedicated RAID card, a simple write request in, say, a RAID 5 array would involve two reads and two writes to different drives. In cases where there is no hardware acceleration, the data from the reads needs to travel all the way back to the CPU and main memory for further processing before the writes can be done.
Kioxia has proposed the use of the PCIe direct memory access feature along with the SSD controller's controller memory buffer (CMB) to avoid the movement of data up to the CPU and back. The required parity computation is done by an accelerator block resident within the SSD controller.
In Kioxia's PoC implementation, the DMA engine can access the entire host address space (including the peer SSD's BAR-mapped CMB), allowing it to receive and transfer data as required from neighboring SSDs on the bus. Kioxia noted that their offload PoC saw close to 50% reduction in CPU utilization and upwards of 90% reduction in system DRAM utilization compared to software RAID done on the CPU. The proposed offload scheme can also handle scrubbing operations without taking up the host CPU cycles for the parity computation task.
Kioxia has already taken steps to contribute these features to the NVM Express working group. If accepted, the proposed offload scheme will be part of a standard that could become widely available across multiple SSD vendors.
StorageAMD sends word this afternoon that the company is delaying the launch of their Ryzen 9000 series desktop processors. The first Zen 5 architecture-based desktop chips were slated to launch next week, on July 31st. But citing quality issues that are significant enough that AMD is even pulling back stock already sent to distributors, AMD is delaying the launch by one to two weeks. The Ryzen 9000 launch will now be a staggered launch, with the Ryzen 5 9600X and Ryzen 7 9700X launching on August 8th, while the Ryzen 9 9900X and flagship Ryzen 9 9950X will launch a week after that, on August 15th.
The exceptional announcement, officially coming from AMD’s SVP and GM of Computing and Graphics, Jack Huynh, is short and to the point. Ahead of the launch, AMD found that “the initial production units that were shipped to our channel partners did not meet our full quality expectations.” And, as a result, the company has needed to delay the launch in order to rectify the issue.
Meanwhile, because AMD had already distributed chips to their channel partners – distributors who then filter down to retailers and system builders – this is technically a recall as well, as AMD needs to pull back the first batch of chips and replace them with known good units. That AMD has to essentially take a do-over on initial chip distribution is ultimately what’s driving this delay; it takes the better part of a month to properly seed retailers for a desktop CPU launch with even modest chip volumes, so AMD has to push the launch out to give their supply chain time to catch up.
For the moment, there are no further details on what the quality issue with the first batch of chips is, how many are affected, or what any kind of fix may entail. Whatever the issue is, AMD is simply taking back all stock and replacing it with what they’re calling “fresh units.”
| AMD Ryzen 9000 Series Processors Zen 5 Microarchitecture (Granite Ridge) |
||||||||
| AnandTech | Cores / Threads |
Base Freq |
Turbo Freq |
L2 Cache |
L3 Cache |
Memory Support | TDP | Launch Date |
| Ryzen 9 9950X | 16C/32T | 4.3GHz | 5.7GHz | 16 MB | 64 MB | DDR5-5600 | 170W | 08/15 |
| Ryzen 9 9900X | 12C/24T | 4.4GHz | 5.6GHz | 12 MB | 64 MB | 120W | ||
| Ryzen 7 9700X | 8C/16T | 3.8GHz | 5.5GHz | 8 MB | 32 MB | 65W | 08/08 | |
| Ryzen 5 9600X | 6C/12T | 3.9GHz | 5.4GHz | 6 MB | 32 MB | 65W | ||
Importantly, however, this announcement is only for the Ryzen 9000 desktop processors, and not the Ryzen AI 300 mobile processors (Strix Point), which are still slated to launch next week. A mobile chip recall would be a much bigger issue (they’re in finished devices that would need significant labor to rework), but also, both the new desktop and mobile Ryzen processors are being made on the same TSMC N4 process node, and have significant overlap due to their shared use of the Zen 5 architecture. To be sure, mobile and desktop are very different dies, but it does strongly imply that whatever the issue is, it’s not a design flaw or a fabrication flaw in the silicon itself.
That AMD is able to re-stage the launch of the desktop Ryzen 9000 chips so quickly – on the order of a few weeks – further points to an issue much farther down the line. If indeed the issue isn’t at the silicon level, then that leaves packaging and testing as the next most likely culprit. Whether that means AMD’s packaging partners had some kind of issue assembling the multi-die chips, or if AMD found some other i... CPUs
At FMS 2024, the technological requirements from the storage and memory subsystem took center stage. Both SSD and controller vendors had various demonstrations touting their suitability for different stages of the AI data pipeline - ingestion, preparation, training, checkpointing, and inference. Vendors like Solidigm have different types of SSDs optimized for different stages of the pipeline. At the same time, controller vendors have taken advantage of one of the features introduced recently in the NVM Express standard - Flexible Data Placement (FDP).
FDP involves the host providing information / hints about the areas where the controller could place the incoming write data in order to reduce the write amplification. These hints are generated based on specific block sizes advertised by the device. The feature is completely backwards-compatible, with non-FDP hosts working just as before with FDP-enabled SSDs, and vice-versa.
Silicon Motion's MonTitan Gen 5 Enterprise SSD Platform was announced back in 2022. Since then, Silicon Motion has been touting the flexibility of the platform, allowing its customers to incorporate their own features as part of the customization process. This approach is common in the enterprise space, as we have seen with Marvell's Bravera SC5 SSD controller in the DapuStor SSDs and Microchip's Flashtec controllers in the Longsys FORESEE enterprise SSDs.
At FMS 2024, the company was demonstrating the advantages of flexible data placement by allowing a single QLC SSD based on their MonTitan platform to take part in different stages of the AI data pipeline while maintaining the required quality of service (minimum bandwidth) for each process. The company even has a trademarked name (PerformaShape) for the firmware feature in the controller that allows the isolation of different concurrent SSD accesses (from different stages in the AI data pipeline) to guarantee this QoS. Silicon Motion claims that this scheme will enable its customers to get the maximum write performance possible from QLC SSDs without negatively impacting the performance of other types of accesses.
Silicon Motion and Phison have market leadership in the client SSD controller market with similar approaches. However, their enterprise SSD controller marketing couldn't be more different. While Phison has gone in for a turnkey solution with their Gen 5 SSD platform (to the extent of not adopting the white label route for this generation, and instead opting to get the SSDs qualified with different cloud service providers themselves), Silicon Motion is opting for a different approach. The flexibility and customization possibilities can make platforms like the MonTitan appeal to flash array vendors.
StorageKioxia's booth at FMS 2024 was a busy one with multiple technology demonstrations keeping visitors occupied. A walk-through of the BiCS 8 manufacturing process was the first to grab my attention. Kioxia and Western Digital announced the sampling of BiCS 8 in March 2023. We had touched briefly upon its CMOS Bonded Array (CBA) scheme in our coverage of Kioxial's 2Tb QLC NAND device and coverage of Western Digital's 128 TB QLC enterprise SSD proof-of-concept demonstration. At Kioxia's booth, we got more insights.
Traditionally, fabrication of flash chips involved placement of the associate logic circuitry (CMOS process) around the periphery of the flash array. The process then moved on to putting the CMOS under the cell array, but the wafer development process was serialized with the CMOS logic getting fabricated first followed by the cell array on top. However, this has some challenges because the cell array requires a high-temperature processing step to ensure higher reliability that can be detrimental to the health of the CMOS logic. Thanks to recent advancements in wafer bonding techniques, the new CBA process allows the CMOS wafer and cell array wafer to be processed independently in parallel and then pieced together, as shown in the models above.
The BiCS 8 3D NAND incorporates 218 layers, compared to 112 layers in BiCS 5 and 162 layers in BiCS 6. The company decided to skip over BiCS 7 (or, rather, it was probably a short-lived generation meant as an internal test vehicle). The generation retains the four-plane charge trap structure of BiCS 6. In its TLC avatar, it is available as a 1 Tbit device. The QLC version is available in two capacities - 1 Tbit and 2 Tbit.
Kioxia also noted that while the number of layers (218) doesn't compare favorably with the latest layer counts from the competition, its lateral scaling / cell shrinkage has enabled it to be competitive in terms of bit density as well as operating speeds (3200 MT/s). For reference, the latest shipping NAND from Micron - the G9 - has 276 layers with a bit density in TLC mode of 21 Gbit/mm2, and operates at up to 3600 MT/s. However, its 232L NAND operates only up to 2400 MT/s and has a bit density of 14.6 Gbit/mm2.
It must be noted that the CBA hybrid bonding process has advantages over the current processes used by other vendors - including Micron's CMOS under array (CuA) and SK hynix's 4D PUC (periphery-under-chip) developed in the late 2010s. It is expected that other NAND vendors will also move eventually to some variant of the hybrid bonding scheme used by Kioxia.
StorageWhen Western Digital introduced its Ultrastar DC SN861 SSDs earlier this year, the company did not disclose which controller it used for these drives, which made many observers presume that WD was using an in-house controller. But a recent teardown of the drive shows that is not the case; instead, the company is using a controller from Fadu, a South Korean company founded in 2015 that specializes on enterprise-grade turnkey SSD solutions.
The Western Digital Ultrastar DC SN861 SSD is aimed at performance-hungry hyperscale datacenters and enterprise customers which are adopting PCIe Gen5 storage devices these days. And, as uncovered in photos from a recent Storage Review article, the drive is based on Fadu's FC5161 NVMe 2.0-compliant controller. The FC5161 utilizes 16 NAND channels supporting an ONFi 5.0 2400 MT/s interface, and features a combination of enterprise-grade capabilities (OCP Cloud Spec 2.0, SR-IOV, up to 512 name spaces for ZNS support, flexible data placement, NVMe-MI 1.2, advanced security, telemetry, power loss protection) not available on other off-the-shelf controllers – or on any previous Western Digital controllers.
The Ultrastar DC SN861 SSD offers sequential read speeds up to 13.7 GB/s as well as sequential write speeds up to 7.5 GB/s. As for random performance, it boasts with an up to 3.3 million random 4K read IOPS and up to 0.8 million random 4K write IOPS. The drives are available in capacities between 1.6 TB and 7.68 TB with one or three drive writes per day (DWPD) over five years rating as well as in U.2 and E1.S form-factors.
While the two form factors of the SN861 share a similar technical design, Western Digital has tailored each version for distinct workloads: the E1.S supports FDP and performance enhancements specifically for cloud environments. By contrast, the U.2 model is geared towards high-performance enterprise tasks and emerging applications like AI.
Without any doubts, Western Digital's Ultrastar DC SN861 is a feature-rich high-performance enterprise-grade SSD. It has another distinctive feature: a 5W idle power consumption, which is rather low by the standards of enterprise-grade drives (e.g., it is 1W lower compared to the SN840). While the difference with predecessors may be just 1W, hyperscalers deploy thousands of drives and for their TCO every watt counts.
Western Digital's Ultrastar DC SN861 SSDs are now available for purchase to select customers (such as Meta) and to interested parties. Prices are unknown, but they will depend on such factors as volumes.
Sources: Fadu, Storage Review
Storage
0 Comments