This week, Kioxia introduced its new 3D QLC NAND devices aimed at high-performance, high-capacity drives that could redefine what we typically expect from QLC-based SSDs. The components are 1 Tb and 2 Tb 3D QLC NAND ICs with a 3600 MT/s interface speed that could enable M.2-2230 SSDs with a 4 TB capacity and decent performance.
Kioxia's 1 Tb (128 MB) and 2 Tb (256 TB) 3D QLC NAND devices are made on the company's BICS 8 process technology and feature 238 active layers as well as CMOS directly Bonded to Array (CBA) design, which implies that CMOS (including interface and buffers circuitry) is built on a specialized node and bonded to the memory array. Such a manufacturing process enabled Kioxia (and its manufacturing partner Western Digital) to achieve a particularly high interface speed of 3600 MT/s.
In addition to being one of the industry's first 2 Tb QLC NAND devices, the component features a 70% higher write power efficiency compared to Kioxia's BICS 5 3D QLC NAND devices, which is a bit vague statement as the new ICs have higher capacity and performance in general. This feature will be valuable for data centre applications, though I do not expect someone to use 3D QLC memory for write-intensive applications in general. Yet, these devices will be just what the doctor ordered for AI: read-intensive, content distribution, and backup storage.
It is interesting to note that Kioxia's 1 Tb 3D QLC NAND, optimized for performance, has a 30% faster sequential write performance and a 15% lower read latency than the 2 Tb 3D QLC component. These qualities (alongside a 3600 MT/s interface) promise to make Kioxia's 1 Tb 3D QLC competitive even for higher-end PCIe Gen5 x4 SSDs, which currently exclusively use 3D TLC memory.
The remarkable storage density of Kioxia's 2Tb 3D QLC NAND devices will allow customers to create high-capacity SSDs in compact form factors. For instance, a 16-Hi stacked package (measuring 11.5 mm × 13.5 mm × 1.5 mm) can be used to build a 4TB M.2-2230 drive or a 16TB M.2-2280 drive. Even a single 16-Hi package could be enough to build a particularly fast client SSD.
Kioxia is now sampling its 2 Tb 3D QLC NAND BiCS 8 memory with customers, such as Pure Storage.
"We have a long-standing relationship with Kioxia and are delighted to incorporate their eighth-generation BiCS Flash 2Tb QLC flash memory products to enhance the performance and efficiency of our all-flash storage solutions," said Charles Giancarlo, CEO of Pure Storage. "Pure's unified all-flash data storage platform is able to meet the demanding needs of artificial intelligence as well as the aggressive costs of backup storage. Backed by Kioxia technology, Pure Storage will continue to offer unmatched performance, power efficiency, and reliability, delivering exceptional value to our customers."
"We are pleased to be shipping samples of our new 2Tb QLC with the new eighth-generation BiCS flash technology," said Hideshi Miyajima, CTO of Kioxia. "With its industry-leading high bit density, high speed data transfer, and superior power efficiency, the 2Tb QLC product will offer new value for rapidly emerging AI applications and large storage applications demanding power and space savings."
There is no word on when the 1 Tb 3D QLC BiCS 8 memory will be sampled or released to the market.
SSDsTenstorrent Launches Wormhole AI Processors: 466 FP8 TFLOPS at 300W Tenstorrent has unveiled its next-generation Wormhole processor for AI workloads that promises to offer decent performance at a low price. The company currently offers two add-on PCIe cards carrying one or two Wormhole processors as well as TT-LoudBox, and TT-QuietBox workstations aimed at software developers. The whole of today's release is aimed at developers rather than those who will deploy the Wormhole boards for their commercial workloads. “It is always rewarding to get more of our products into developer hands. Releasing development systems with our Wormhole™ card helps developers scale up and work on multi-chip AI software.” said Jim Keller, CEO of Tenstorrent. “In addition to this launch, we are excited that the tape-out and power-on for our second generation, Blackhole, is going very well.” Each Wormhole processor packs 72 Tensix cores (featuring five RISC-V cores supporting various data formats) with 108 MB of SRAM to deliver 262 FP8 TFLOPS at 1 GHz at 160W thermal design power. A single-chip Wormhole n150 card carries 12 GB of GDDR6 memory featuring a 288 GB/s bandwidth. Wormhole processors offer flexible scalability to meet the varying needs of workloads. In a standard workstation setup with four Wormhole n300 cards, the processors can merge to function as a single unit, appearing as a unified, extensive network of Tensix cores to the software. This configuration allows the accelerators to either work on the same workload, be divided among four developers or run up to eight distinct AI models simultaneously. A crucial feature of this scalability is that it operates natively without the need for virtualization. In data center environments, Wormhole processors will scale both inside one machine using PCIe or outside of a single machine using Ethernet. From performance standpoint, Tenstorrent's single-chip Wormhole n150 card (72 Tensix cores at 1 GHz, 108 MB SRAM, 12 GB GDDR6 at 288 GB/s) is capable of 262 FP8 TFLOPS at 160W, whereas the dual-chip Wormhole n300 board (128 Tensix cores at 1 GHz, 192 MB SRAM, aggregated 24 GB GDDR6 at 576 GB/s) can offer up to 466 FP8 TFLOPS at 300W (according to Tom's Hardware). To put that 466 FP8 TFLOPS at 300W number into context, let's compare it to what AI market leader Nvidia has to offer at this thermal design power. Nvidia's A100 does not support FP8, but it does support INT8 and its peak performance is 624 TOPS (1,248 TOPS with sparsity). By contrast, Nvidia's H100 supports FP8 and its peak performance is massive 1,670 TFLOPS (3,341 TFLOPS with sparsity) at 300W, which is a big difference from Tenstorrent's Wormhole n300. There is a big catch though. Tenstorrent's Wormhole n150 is offered for $999, whereas n300 is available for $1,399. By contrast, one Nvidia H100 card can retail for $30,000, depending on quantities. Of course, we do not know whether four or eight Wormhole processors can indeed deliver the performance of a single H300, though they will do so at 600W or 1200W TDP, respectively. In addition to cards, Tenstorrent offers developers pre-built workstations with four n300 cards inside the less expensive Xeon-based TT-LoudBox with active cooling and a premium EPYC-powered TT-QuietBox with liquid cooling. Sources: Tenstorrent, Tom's Hardware AI
Frore Unveils Waterproof AirJet Mini Sport for Smartphones Over the past couple of years, Frore Systems has demonstrated several ways that its AirJet solid-state active cooling systems can be used to improve cooling in fanless devices like laptops, tablets, SSDs, and edge computing devices. But there are a subset of those applications that need their cooling options to also be waterproof, and Frore is looking to address those as well. To that end, this week Frore introduced its AirJet Mini Sport, a waterproof, IP68-rated solid-state cooling device that is aimed at use in smartphones and action cameras. Introduced at MWC Shanghai to attract attention of China-based handset vendors, edge and industrial computing devices, and action cameras, the AirJet Mini Sport is an enhanced version of Frore's AirJet Mini Slim. This version has been fully waterproofed, offering IP68-level protection that allows it to work while being submerged in over 1.5 meters of water for up to 30 minutes. Internally, the AirJet Mini Sport can effectively dissipate 5.25 Watts of heat by generating 1750 Pascals of back pressure, while consuming 1 Watt of energy itself. Elsewhere, Frore claims that the AirJet Mini Sport can be used to provide 2.5 Watts of cooling capacity to smartphones. Which, although not enough to cover the complete power consumption/heat dissipation of a high-end SoC, would have a significant impact on both burst and steady-state performance by allowing those chips to run at peak clocks and power for longer periods of time. To ensure consistent performance of Frore's AirJet Mini Sport in diverse environments, the cooling device includes features such as dust resistance and self-cleaning. In addition, just like AirJet Mini Slim, the Sport-badged version its own thermal sensor to control its own operation and maintain optimal performance. As a result, Frore claims that smartphones and action cameras with the AirJet Mini Sport can achieve up to 80% better performance. "We are excited to announce the waterproof AirJet Mini Sport," said Dr. Seshu Madhavapeddy, founder and CEO of Frore Systems. "Consumers demand increased performance in compact devices they can use anywhere, on land or in water. AirJet unleashes device performance, now enabling users to do more with their IP68 dustproof and waterproof devices." Air Cooling
G.Skill on Tuesday introduced its ultra-low-latency DDR5-6400 memory modules that feature a CAS latency of 30 clocks, which appears to be the industry's most aggressive timings yet for DDR5-6400 sticks. The modules will be available for both AMD and Intel CPU-based systems.
With every new generation of DDR memory comes an increase in data transfer rates and an extension of relative latencies. While for the vast majority of applications, the increased bandwidth offsets the performance impact of higher timings, there are applications that favor low latencies. However, shrinking latencies is sometimes harder than increasing data transfer rates, which is why low-latency modules are rare.
Nonetheless, G.Skill has apparently managed to cherry-pick enough DDR5 memory chips and build appropriate printed circuit boards to produce DDR5-6400 modules with CL30 timings, which are substantially lower than the CL46 timings recommended by JEDEC for this speed bin. This means that while JEDEC-standard modules have an absolute latency of 14.375 ns, G.Skill's modules can boast a latency of just 9.375 ns – an approximately 35% decrease.
G.Skill's DDR5-6400 CL30 39-39-102 modules have a capacity of 16 GB and will be available in 32 GB dual-channel kits, though the company does not disclose voltages, which are likely considerably higher than those standardized by JEDEC.
The company plans to make its DDR5-6400 modules available both for AMD systems with EXPO profiles (Trident Z5 Neo RGB and Trident Z5 Royal Neo) and for Intel-powered PCs with XMP 3.0 profiles (Trident Z5 RGB and Trident Z5 Royal). For AMD AM5 systems that have a practical limitation of 6000 MT/s – 6400 MT/s for DDR5 memory (as this is roughly as fast as AMD's Infinity Fabric can operate at with a 1:1 ratio), the new modules will be particularly beneficial for AMD's Ryzen 7000 and Ryzen 9000-series processors.
G.Skill notes that since its modules are non-standard, they will not work with all systems but will operate on high-end motherboards with properly cooled CPUs.
The new ultra-low-latency memory kits will be available worldwide from G.Skill's partners starting in late August 2024. The company did not disclose the pricing of these modules, but since we are talking about premium products that boast unique specifications, they are likely to be priced accordingly.
MemoryNVIDIA on Tuesday said that future monitor scalers from MediaTek will support its G-Sync technologies. NVIDIA is partnering with MediaTek to integrate its full range of G-Sync technologies into future monitors without requiring a standalone G-Sync module, which makes advanced gaming features more accessible across a broader range of displays.
Traditionally, G-Sync technology relied on a dedicated G-sync module – based on an Altera FPGA – to handle syncing display refresh rates with the GPU in order to reduce screen tearing, stutter, and input lag. As a more basic solution, in 2019 NVIDIA introduced G-Sync Compatible certification and branding, which leveraged the industry-standard VESA AdaptiveSync technology to handle variable refresh rates. In lieu of using a dedicated module, leveraging AdaptiveSync allowed for cheaper monitors, with NVIDIA's program serving as a stamp of approval that the monitor worked with NVIDIA GPUs and met NVIDIA's performance requirements. Still, G-Sync Compatible monitors still lack some features that, to date, require the dedicated G-Sync module.
Through this new partnership with MediaTek, MediaTek will bring support for all of NVIDIA's G-Sync technologies, including the latest G-Sync Pulsar, directly into their scalers. G-Sync Pulsar enhances motion clarity and reduces ghosting, providing a smoother gaming experience. In addition to variable refresh rates and Pulsar, MediaTek-based G-Sync displays will support such features as variable overdrive, 12-bit color, Ultra Low Motion Blur, low latency HDR, and Reflex Analyzer. This integration will allow more monitors to support a full range of G-Sync features without having to incorporate an expensive FPGA.
The first monitors to feature full G-Sync support without needing an NVIDIA module include the AOC Agon Pro AG276QSG2, Acer Predator XB273U F5, and ASUS ROG Swift 360Hz PG27AQNR. These monitors offer 360Hz refresh rates, 1440p resolution, and HDR support.
What remains to be seen is which specific MediaTek's scalers will support NVIDIA's G-Sync technology – or if the company is going to implement support into all of their scalers going forward. It also remains to be seen whether monitors with NVIDIA's dedicated G-Sync modules retain any advantages over displays with MediaTek's scalers.
Monitors
0 Comments