Samsung has successfully validated its new LPDDR5X-10700 memory with MediaTek's upcoming Dimensity platform. At present, 10.7 GT/s is the highest performing speed grade of LPDDR5X DRAM slated to be released this year, so the upcoming Dimensity 9400 system-on-chip will get the highest memory bandwidth available for a mobile application processor.
The verification process involved Samsung's 16 GB LPDDR5X package and MediaTek's soon-to-be-announced Dimensity 9400 SoC for high-end 5G smartphones. Usage of LPDDR5X-10700 provides a memory bandwidth of 85.6 GB/second over a 64-bit interface, which will be available for bandwidth-hungry applications like graphics and generative AI.
"Working together with Samsung Electronics has made it possible for MediaTek's next-generation Dimensity chipset to become the world's first to be validated at LPDDR5X operating speeds up to 10.7Gbps, enabling upcoming devices to deliver AI functionality and mobile performance at a level we have never seen before," said JC Hsu, Corporate Senior Vice President at MediaTek. "This updated architecture will make it easier for developers and users to leverage more AI capabilities and take advantage of more features with less impact on battery life."
Samsung's LPDDR5X 10.7 GT/s memory in made on the company's 12nm-class DRAM process technology and is said to provide a more than 25% improvement in power efficiency over previous-generation LPDDR5X, in addition to extra performance. This will positively affect improved user experience, including enhanced on-device AI capabilities, such as faster voice-to-text conversion, and better quality graphics.
Overall, the two companies completed this process in just three months. Though it remains to be seen when smartphones based on the Dimensity 9400 application processor and LPDDR5X memory are set to be available on the market, as MediaTek has not yet even formally announced the SoC itself.
"Through our strategic cooperation with MediaTek, Samsung has verified the industry's fastest LPDDR5X DRAM that is poised to lead the AI smartphone market," said YongCheol Bae, Executive Vice President of Memory Product Planning at Samsung Electronics. "Samsung will continue to innovate through active collaboration with customers and provide optimum solutions for the on-device AI era."
MemoryKioxia's High-Performance 3D QLC NAND Enables High-End High-Capacity SSDs This week, Kioxia introduced its new 3D QLC NAND devices aimed at high-performance, high-capacity drives that could redefine what we typically expect from QLC-based SSDs. The components are 1 Tb and 2 Tb 3D QLC NAND ICs with a 3600 MT/s interface speed that could enable M.2-2230 SSDs with a 4 TB capacity and decent performance. Kioxia's 1 Tb (128 MB) and 2 Tb (256 TB) 3D QLC NAND devices are made on the company's BICS 8 process technology and feature 238 active layers as well as CMOS directly Bonded to Array (CBA) design, which implies that CMOS (including interface and buffers circuitry) is built on a specialized node and bonded to the memory array. Such a manufacturing process enabled Kioxia (and its manufacturing partner Western Digital) to achieve a particularly high interface speed of 3600 MT/s. In addition to being one of the industry's first 2 Tb QLC NAND devices, the component features a 70% higher write power efficiency compared to Kioxia's BICS 5 3D QLC NAND devices, which is a bit vague statement as the new ICs have higher capacity and performance in general. This feature will be valuable for data centre applications, though I do not expect someone to use 3D QLC memory for write-intensive applications in general. Yet, these devices will be just what the doctor ordered for AI: read-intensive, content distribution, and backup storage. It is interesting to note that Kioxia's 1 Tb 3D QLC NAND, optimized for performance, has a 30% faster sequential write performance and a 15% lower read latency than the 2 Tb 3D QLC component. These qualities (alongside a 3600 MT/s interface) promise to make Kioxia's 1 Tb 3D QLC competitive even for higher-end PCIe Gen5 x4 SSDs, which currently exclusively use 3D TLC memory. The remarkable storage density of Kioxia's 2Tb 3D QLC NAND devices will allow customers to create high-capacity SSDs in compact form factors. For instance, a 16-Hi stacked package (measuring 11.5 mm × 13.5 mm × 1.5 mm) can be used to build a 4TB M.2-2230 drive or a 16TB M.2-2280 drive. Even a single 16-Hi package could be enough to build a particularly fast client SSD. Kioxia is now sampling its 2 Tb 3D QLC NAND BiCS 8 memory with customers, such as Pure Storage. "We have a long-standing relationship with Kioxia and are delighted to incorporate their eighth-generation BiCS Flash 2Tb QLC flash memory products to enhance the performance and efficiency of our all-flash storage solutions," said Charles Giancarlo, CEO of Pure Storage. "Pure's unified all-flash data storage platform is able to meet the demanding needs of artificial intelligence as well as the aggressive costs of backup storage. Backed by Kioxia technology, Pure Storage will continue to offer unmatched performance, power efficiency, and reliability, delivering exceptional value to our customers." "We are pleased to be shipping samples of our new 2Tb QLC with the new eighth-generation BiCS flash technology," said Hideshi Miyajima, CTO of Kioxia. "With its industry-leading high bit density, high speed data transfer, and superior power efficiency, the 2Tb QLC product will offer new value for rapidly emerging AI applications and large storage applications demanding power and space savings." There is no word on when the 1 Tb 3D QLC BiCS 8 memory will be sampled or released to the market. SSDs
Tenstorrent Launches Wormhole AI Processors: 466 FP8 TFLOPS at 300W Tenstorrent has unveiled its next-generation Wormhole processor for AI workloads that promises to offer decent performance at a low price. The company currently offers two add-on PCIe cards carrying one or two Wormhole processors as well as TT-LoudBox, and TT-QuietBox workstations aimed at software developers. The whole of today's release is aimed at developers rather than those who will deploy the Wormhole boards for their commercial workloads. “It is always rewarding to get more of our products into developer hands. Releasing development systems with our Wormhole™ card helps developers scale up and work on multi-chip AI software.” said Jim Keller, CEO of Tenstorrent. “In addition to this launch, we are excited that the tape-out and power-on for our second generation, Blackhole, is going very well.” Each Wormhole processor packs 72 Tensix cores (featuring five RISC-V cores supporting various data formats) with 108 MB of SRAM to deliver 262 FP8 TFLOPS at 1 GHz at 160W thermal design power. A single-chip Wormhole n150 card carries 12 GB of GDDR6 memory featuring a 288 GB/s bandwidth. Wormhole processors offer flexible scalability to meet the varying needs of workloads. In a standard workstation setup with four Wormhole n300 cards, the processors can merge to function as a single unit, appearing as a unified, extensive network of Tensix cores to the software. This configuration allows the accelerators to either work on the same workload, be divided among four developers or run up to eight distinct AI models simultaneously. A crucial feature of this scalability is that it operates natively without the need for virtualization. In data center environments, Wormhole processors will scale both inside one machine using PCIe or outside of a single machine using Ethernet. From performance standpoint, Tenstorrent's single-chip Wormhole n150 card (72 Tensix cores at 1 GHz, 108 MB SRAM, 12 GB GDDR6 at 288 GB/s) is capable of 262 FP8 TFLOPS at 160W, whereas the dual-chip Wormhole n300 board (128 Tensix cores at 1 GHz, 192 MB SRAM, aggregated 24 GB GDDR6 at 576 GB/s) can offer up to 466 FP8 TFLOPS at 300W (according to Tom's Hardware). To put that 466 FP8 TFLOPS at 300W number into context, let's compare it to what AI market leader Nvidia has to offer at this thermal design power. Nvidia's A100 does not support FP8, but it does support INT8 and its peak performance is 624 TOPS (1,248 TOPS with sparsity). By contrast, Nvidia's H100 supports FP8 and its peak performance is massive 1,670 TFLOPS (3,341 TFLOPS with sparsity) at 300W, which is a big difference from Tenstorrent's Wormhole n300. There is a big catch though. Tenstorrent's Wormhole n150 is offered for $999, whereas n300 is available for $1,399. By contrast, one Nvidia H100 card can retail for $30,000, depending on quantities. Of course, we do not know whether four or eight Wormhole processors can indeed deliver the performance of a single H300, though they will do so at 600W or 1200W TDP, respectively. In addition to cards, Tenstorrent offers developers pre-built workstations with four n300 cards inside the less expensive Xeon-based TT-LoudBox with active cooling and a premium EPYC-powered TT-QuietBox with liquid cooling. Sources: Tenstorrent, Tom's Hardware AI
G.Skill on Tuesday introduced its ultra-low-latency DDR5-6400 memory modules that feature a CAS latency of 30 clocks, which appears to be the industry's most aggressive timings yet for DDR5-6400 sticks. The modules will be available for both AMD and Intel CPU-based systems.
With every new generation of DDR memory comes an increase in data transfer rates and an extension of relative latencies. While for the vast majority of applications, the increased bandwidth offsets the performance impact of higher timings, there are applications that favor low latencies. However, shrinking latencies is sometimes harder than increasing data transfer rates, which is why low-latency modules are rare.
Nonetheless, G.Skill has apparently managed to cherry-pick enough DDR5 memory chips and build appropriate printed circuit boards to produce DDR5-6400 modules with CL30 timings, which are substantially lower than the CL46 timings recommended by JEDEC for this speed bin. This means that while JEDEC-standard modules have an absolute latency of 14.375 ns, G.Skill's modules can boast a latency of just 9.375 ns – an approximately 35% decrease.
G.Skill's DDR5-6400 CL30 39-39-102 modules have a capacity of 16 GB and will be available in 32 GB dual-channel kits, though the company does not disclose voltages, which are likely considerably higher than those standardized by JEDEC.
The company plans to make its DDR5-6400 modules available both for AMD systems with EXPO profiles (Trident Z5 Neo RGB and Trident Z5 Royal Neo) and for Intel-powered PCs with XMP 3.0 profiles (Trident Z5 RGB and Trident Z5 Royal). For AMD AM5 systems that have a practical limitation of 6000 MT/s – 6400 MT/s for DDR5 memory (as this is roughly as fast as AMD's Infinity Fabric can operate at with a 1:1 ratio), the new modules will be particularly beneficial for AMD's Ryzen 7000 and Ryzen 9000-series processors.
G.Skill notes that since its modules are non-standard, they will not work with all systems but will operate on high-end motherboards with properly cooled CPUs.
The new ultra-low-latency memory kits will be available worldwide from G.Skill's partners starting in late August 2024. The company did not disclose the pricing of these modules, but since we are talking about premium products that boast unique specifications, they are likely to be priced accordingly.
MemoryWhen Western Digital introduced its Ultrastar DC SN861 SSDs earlier this year, the company did not disclose which controller it used for these drives, which made many observers presume that WD was using an in-house controller. But a recent teardown of the drive shows that is not the case; instead, the company is using a controller from Fadu, a South Korean company founded in 2015 that specializes on enterprise-grade turnkey SSD solutions.
The Western Digital Ultrastar DC SN861 SSD is aimed at performance-hungry hyperscale datacenters and enterprise customers which are adopting PCIe Gen5 storage devices these days. And, as uncovered in photos from a recent Storage Review article, the drive is based on Fadu's FC5161 NVMe 2.0-compliant controller. The FC5161 utilizes 16 NAND channels supporting an ONFi 5.0 2400 MT/s interface, and features a combination of enterprise-grade capabilities (OCP Cloud Spec 2.0, SR-IOV, up to 512 name spaces for ZNS support, flexible data placement, NVMe-MI 1.2, advanced security, telemetry, power loss protection) not available on other off-the-shelf controllers – or on any previous Western Digital controllers.
The Ultrastar DC SN861 SSD offers sequential read speeds up to 13.7 GB/s as well as sequential write speeds up to 7.5 GB/s. As for random performance, it boasts with an up to 3.3 million random 4K read IOPS and up to 0.8 million random 4K write IOPS. The drives are available in capacities between 1.6 TB and 7.68 TB with one or three drive writes per day (DWPD) over five years rating as well as in U.2 and E1.S form-factors.
While the two form factors of the SN861 share a similar technical design, Western Digital has tailored each version for distinct workloads: the E1.S supports FDP and performance enhancements specifically for cloud environments. By contrast, the U.2 model is geared towards high-performance enterprise tasks and emerging applications like AI.
Without any doubts, Western Digital's Ultrastar DC SN861 is a feature-rich high-performance enterprise-grade SSD. It has another distinctive feature: a 5W idle power consumption, which is rather low by the standards of enterprise-grade drives (e.g., it is 1W lower compared to the SN840). While the difference with predecessors may be just 1W, hyperscalers deploy thousands of drives and for their TCO every watt counts.
Western Digital's Ultrastar DC SN861 SSDs are now available for purchase to select customers (such as Meta) and to interested parties. Prices are unknown, but they will depend on such factors as volumes.
Sources: Fadu, Storage Review
Storage
0 Comments