Already solidly in the driver’s seat of the generative AI accelerator market at this time, NVIDIA has long made it clear that the company isn’t about to slow down and check out the view. Instead, NVIDIA intends to continue iterating along its multi-generational product roadmap for GPUs and accelerators, to leverage its early advantage and stay ahead of its ever-growing coterie of competitors in the accelerator market. So while NVIDIA’s ridiculously popular H100/H200/GH200 series of accelerators are already the hottest ticket in Silicon Valley, it’s already time to talk about the next generation accelerator architecture to feed NVIDIA’s AI ambitions: Blackwell.
GPUsAMD Announces FSR 3.1: Seriously Improved Upscaling Quality AMD's FidelityFX Super Resolution 3 technology package introduced a plethora of enhancements to the FSR technology on Radeon RX 6000 and 7000-series graphics cards last September. But perfection has no limits, so this week, the company is rolling out its FSR 3.1 technology, which improves upscaling quality, decouples frame generation from AMD's upscaling, and makes it easier for developers to work with FSR. Arguably, AMD's FSR 3.1's primary enhancement is its improved temporal upscaling image quality: compared to FSR 2.2, the image flickers less at rest and no longer ghosts when in movement. This is a significant improvement, as flickering and ghosting artifacts are particularly annoying. Meanwhile, FSR 3.1 has to be implemented by the game developer itself, and the first title to support this new technology sometime later this year is Ratchet & Clank: Rift Apart. Temporal Stability AMD FSR 2.2 AMD FSR 3.1 Ghosting Reduction AMD FSR 2.2 AMD FSR 3.1 Another significant development brought by FSR 3.1 is its decoupling from the Frame Generation feature introduced by FSR 3. This capability relies on a form of AMD's Fluid Motion Frames (AFMF) optical flow interpolation. It uses temporal game data like motion vectors to add an additional frame between existing ones. This ability can lead to a performance boost of up to two times in compatible games, but it was initially tied to FSR 3 upscaling, which is a limitation. Starting from FSR 3.1, it will work with other upscaling methods, though AMD refrains from saying which methods and on which hardware for now. Also, the company does not disclose when it is expected to be implemented by game developers. In addition, AMD is bringing support for FSR3 to Vulkan and Xbox Game Development Kit, enabling game developers on these platforms to use it. It also adds FSR 3.1 to the FidelityFX API, which simplifies debugging and enables forward compatibility with updated versions of FSR. Upon its release in September 2023, AMD FSR 3 was initially supported by two titles, Forspoken and Immortals of Aveum, with ten more games poised to join them back then. Fast forward to six months later, the lineup has expanded to an impressive roster of 40 games either currently supporting or set to incorporate FSR 3 shortly. As of March 2024, FSR is supported by games like Avatar: Frontiers of Pandora, Starfield, The Last of Us Part I. Shortly, Cyberpunk 2077, Dying Light 2 Stay Human, Frostpunk 2, and Ratchet & Clank: Rift Apart will support FSR shortly. Source: AMD GPUs
Qualcomm Intros Snapdragon 7+ Gen 3: Pushing GenAI Into Premium Smartphones Proving the adage “ask, and you shall receive”, Qualcomm is back this week for a second Snapdragon SoC announcement for mobile phones. This time, the company is announcing the Snapdragon 7+ Gen 3, the latest-generation member of their relatively new Snapdragon 7+ lineup of SoCs. Like its predecessor, the Snapdragon 7+ Gen 2, the Gen 3 is aimed at the premium segment of smartphones, offering high-end features with more modest performance and costs – but still a feature set and level of performance ahead of “mid-tier” smartphone SoCs. And, with Monday’s launch of the Snapdragon 8s Gen 3, this is a segment that has been bifurcated into two lines of SKUs over at Qualcomm. Mobile
G.Skill on Tuesday introduced its ultra-low-latency DDR5-6400 memory modules that feature a CAS latency of 30 clocks, which appears to be the industry's most aggressive timings yet for DDR5-6400 sticks. The modules will be available for both AMD and Intel CPU-based systems.
With every new generation of DDR memory comes an increase in data transfer rates and an extension of relative latencies. While for the vast majority of applications, the increased bandwidth offsets the performance impact of higher timings, there are applications that favor low latencies. However, shrinking latencies is sometimes harder than increasing data transfer rates, which is why low-latency modules are rare.
Nonetheless, G.Skill has apparently managed to cherry-pick enough DDR5 memory chips and build appropriate printed circuit boards to produce DDR5-6400 modules with CL30 timings, which are substantially lower than the CL46 timings recommended by JEDEC for this speed bin. This means that while JEDEC-standard modules have an absolute latency of 14.375 ns, G.Skill's modules can boast a latency of just 9.375 ns – an approximately 35% decrease.
G.Skill's DDR5-6400 CL30 39-39-102 modules have a capacity of 16 GB and will be available in 32 GB dual-channel kits, though the company does not disclose voltages, which are likely considerably higher than those standardized by JEDEC.
The company plans to make its DDR5-6400 modules available both for AMD systems with EXPO profiles (Trident Z5 Neo RGB and Trident Z5 Royal Neo) and for Intel-powered PCs with XMP 3.0 profiles (Trident Z5 RGB and Trident Z5 Royal). For AMD AM5 systems that have a practical limitation of 6000 MT/s – 6400 MT/s for DDR5 memory (as this is roughly as fast as AMD's Infinity Fabric can operate at with a 1:1 ratio), the new modules will be particularly beneficial for AMD's Ryzen 7000 and Ryzen 9000-series processors.
G.Skill notes that since its modules are non-standard, they will not work with all systems but will operate on high-end motherboards with properly cooled CPUs.
The new ultra-low-latency memory kits will be available worldwide from G.Skill's partners starting in late August 2024. The company did not disclose the pricing of these modules, but since we are talking about premium products that boast unique specifications, they are likely to be priced accordingly.
MemoryThe CXL consortium has had a regular presence at FMS (which rechristened itself from 'Flash Memory Summit' to the 'Future of Memory and Storage' this year). Back at FMS 2022, the company had announced v3.0 of the CXL specifications. This was followed by CXL 3.1's introduction at Supercomputing 2023. Having started off as a host to device interconnect standard, it had slowly subsumed other competing standards such as OpenCAPI and Gen-Z. As a result, the specifications started to encompass a wide variety of use-cases by building a protocol on top of the the ubiquitous PCIe expansion bus. The CXL consortium comprises of heavyweights such as AMD and Intel, as well as a large number of startup companies attempting to play in different segments on the device side. At FMS 2024, CXL had a prime position in the booth demos of many vendors.
The migration of server platforms from DDR4 to DDR5, along with the rise of workloads demanding large RAM capacity (but not particularly sensitive to either memory bandwidth or latency), has opened up memory expansion modules as one of the first set of widely available CXL devices. Over the last couple of years, we have had product announcements from Samsung and Micron in this area.
At FMS 2024, SK hynix was showing off their DDR5-based CMM-DDR5 CXL memory module with a 128 GB capacity. The company was also detailing their associated Heterogeneous Memory Software Development Kit (HMSDK) - a set of libraries and tools at both the kernel and user levels aimed at increasing the ease of use of CXL memory. This is achieved in part by considering the memory pyramid / hierarchy and relocating the data between the server's main memory (DRAM) and the CXL device based on usage frequency.
The CMM-DDR5 CXL memory module comes in the SDFF form-factor (E3.S 2T) with a PCIe 3.0 x8 host interface. The internal memory is based on 1α technology DRAM, and the device promises DDR5-class bandwidth and latency within a single NUMA hop. As these memory modules are meant to be used in datacenters and enterprises, the firmware includes features for RAS (reliability, availability, and serviceability) along with secure boot and other management features.
SK hynix was also demonstrating Niagara 2.0 - a hardware solution (currently based on FPGAs) to enable memory pooling and sharing - i.e, connecting multiple CXL memories to allow different hosts (CPUs and GPUs) to optimally share their capacity. The previous version only allowed capacity sharing, but the latest version enables sharing of data also. SK hynix had presented these solutions at the CXL DevCon 2024 earlier this year, but some progress seems to have been made in finalizing the specifications of the CMM-DDR5 at FMS 2024.
Micron had unveiled the CZ120 CXL Memory Expansion Module last year based on the Microchip SMC 2000 series CXL memory controller. At FMS 2024, Micron and Microchip had a demonstration of the module on a Granite Rapids server.
Additional insights into the SMC 2000 controller were also provided.
The CXL memory controller also incorporates DRAM die failure handling, and Microchip also provides diagnostics and debug tools to analyze failed modules. The memory controller also supports ECC, which forms part of the enterprise... Storage
0 Comments