GPU and CPU news and discussions

Tadasuke · Post by **Tadasuke** » Sun Dec 24, 2023 9:59 am

Allegedly, according to Intel, Emerald Rapids has in comparison to January 2023 Sapphire Rapids:
• 21% faster general-purpose performance
• 42% faster AI inference performance
• 36% better energy efficiency (perf per watt)
• up to 3x more cache (only in the higher models)
• up to 16.7% higher memory bandwidth (depending on the model, from DDR-4400 to DDR5-5600)
• support for new AMX instructions (intended for AI workloads)
• support for CXL Type 3 memory devices

Here are three example performance comparisons made by tom's HARDWARE's Paul Alcorn:

The EPYC 9654 takes the lead in supercomputer test suites, though the core-dense Zen 4C Bergamo 9754 tops all the charts. The 8592+ shows some improvements over the previous generation 8490H, particularly in the Graph 500 test, but on a core-for-core basis, AMD still leads Intel.

The Weather Research and Forecasting model (WRF) is memory bandwidth sensitive, so it's no surprise to see the Genoa 9554 with its 12-channel memory controller outstripping the eight-channel Emerald Rapids 8592+. Again, the 9654's lackluster performance here is probably at least partially due to lower per-core memory throughput. There is a similar pattern with the NWChem computational chemistry workload, though the 9654 is more impressive in this benchmark.

The 8592+ is impressive in the SciMark benchmarks, but falls behind in the Rodinia LavaMD test (Rodinia suite is focused on accelerating compute-intensive applications with accelerators). It makes up for that shortcoming with the Rodinia HotSpot3D and CFD Solver benchmarks, with the latter showing a large delta between the 8592+ and competing processors. The 64-core Genoa 9554 fires back in the OpenFOAM CFD (free, open source software for computational fluid dynamics) workloads, though.

source: tom's HARDWARE early review of the new Intel Xeon Emerald Rapids

I hope for (Granite Rapids?) soon upcoming Intel many-core CPUs to be at least twice more performant in general-purpose and at least twice more energy efficient than Emerald Rapids, without rising prices even by a dollar. Perhaps even managing to improve AI performance by another 10x. Would be really nice for that to happen soon.

Tadasuke · Post by **Tadasuke** » Tue Dec 26, 2023 8:39 am

Back in January 2010, Intel Xeon X7560 came out. It featured 8 cores, 16 threads, 2.4 GHz all-core frequency, 512 KB L1 cache, 2 MB L2 cache, 24 MB L3 cache, 34.1 GB/s DRAM bandwidth and up to 256 GB of DDR3 RAM per CPU (up to 8 CPUs in one LGA1567 motherboard). Transistor count 2.3 billion and MSRP 3692 USD (5200 USD today).

Now in December 2023, Intel Xeon Platinum 8592+ is out. It features 64 cores, 128 threads, 2.9 GHz all-core frequency, 5 MB L1 cache, 128 MB L2 cache, 320 MB L3 cache, 307.2 GB/s DRAM bandwidth and up to 6 TB of DDR5 RAM per CPU (up to 8 CPUs in one LGA4677 motherboard). Transistor count 61 billion (2 tiles both 30.5 BT - 33 cores with 1 disabled) and MSRP 11600 USD.

This means that in 14 years, Intel has managed to increase their maximum CPU transistor count by 26.5x, corecount by 8x, clock frequency by 21%, total cache by 17.1x, memory bandwidth by 9x, maximum RAM by 24x and official price by 123% (after adjusting for inflation).

If Moore's Law states, that "transistor counts ought to double every 24 months", then in 14 years, CPUs "ought to" have 128x more transistors. 26.5x more mean doubling time is more like 3 years instead of 2 years (so 32x in 15 years).

Is Moore's Law dead then? Hard to say, depends on how strict you get with its definition. My personal take is that it has slowed down rather than died completely.

However, I want to add that after seeing dozens and dozens of performance comparisons of server and workstations CPUs, my conclusion is that after 32 cores 64 threads there is certainly a very visible and important problem with further scaling of performance with higher corecounts.

64 cores isn't 2x faster than 32 cores and 96 cores isn't 1.5x faster than 64 cores. It is evident in almost every single benchmark and workload, including i.e. PassMark where 96 cores Zen 4 aren't even 2x faster than 32 cores Zen 4. But Zen 4 is still visibly faster than Zen 2 or Ice Lake with the same corecount. So there's still some progress, including new instructions or accelerators for AI.

I wonder if more than 32 cores is going to be particularly useful in more than a few use cases. Certainly cache and memory bandwidth play a significant role in all of this. Perhaps there is not enough cache and memory bandwidth for more than 32 cores at the moment. Or latencies are too high (both Intel and AMD use tiles or chiplets at the moment).

As of now, the average person can still only dream of having 32 current-gen cores. So going from 4, 6 or 8 to 32 would still be an improvement, even if more than 32 wouldn't really be (energy requirements would go up though). Perhaps the future lies in accelerators for workloads such as AI.

Tadasuke · Post by **Tadasuke** » Thu Dec 28, 2023 3:34 pm

Performance evaluation between AMD's 32-core Zen+, Zen 2, Zen 3 and Zen 4 HEDT CPUs (2018, 2019, 2022 and 2023) in PassMark multi-threaded. Scaling of performance up to 32 cores 64 threads is pretty good in professional or server workloads, so it's a rather realistic and useful representation in my opinion (but take it with a grain of salt). Beyond 32 cores problems with performance scaling start getting very obvious. However, it may be that, in the future most of these problems are going to be solved, at least up to let's say 128 cores 256 threads or so.

Tadasuke · Post by **Tadasuke** » Sun Dec 31, 2023 2:31 pm

Q2 2018 Intel top laptop i9 vs Q4 2023 Intel top laptop i9 comparison in PassMark:

62.7% higher single-thread rating and 214% higher CPU Mark after ~5.5 years

the new Meteor Lake-H Ultra 9 185H has a higher maximum TDP (or power draw) than the old Coffee Lake-H i9-8950HK

the new one has 10.44x faster iGPU integrated graphics, when measured in standard fp32 gigaflops (4x more execution units, 4x more TMUs and 4x more ROPs), the new one has 10 TOPs AI-specific processor (first in an Intel laptop chip, but AMD released Phoenix APUs earlier in 2023 also with 10 TOPs and soon Strix Point in January with 16 TOPs, so 60% faster), 2.86x faster memory bandwidth and 3.14x more summed cache (L1+L2+L3)

AVX-512 instructions can make some AI calculations twice as fast by the way and Zen 4 AMD CPUs support AVX-512

One More Thing : PassMark very often shows incorrect cache sizes and unfortunately here given cache sizes are also incorrect (the i9-8950HK has 14.2 MB total cache, while the Ultra 9 185H has 44.64 MB total cache)

Tadasuke · Post by **Tadasuke** » Fri Jan 05, 2024 10:45 am

Qualcomm XR2 Gen2+ is allegedly 700% faster in AI and 150% faster in real-time graphics than XR2 Gen1 used in Quest 2 in 2020. The CPU is probably about 100% faster. XR2 Gen2 in Quest 3 supports 10 concurrent cameras, so 2 less than XR2 Gen2+.

As for AMD APUs (Phoenix Point, Hawk Point, Strix Point and Strix Halo):

YouTube · Post by **wjfox** » Fri Jan 05, 2024 3:28 pm

Nvidia teases RTX 40 Super series launch at CES — here's all of the leaked specs

By Aaron Klotz
published about 4 hours ago

After months of rumors and leaks, Nvidia confirmed that it will unveil a new graphics card lineup soon. The post shows a video of a GeForce RTX Founders Edition graphics card hovering over the earth's horizon, depicting the date January 8th, 2024, at 8 AM Pacific time. The post effectively confirms that Nvidia will announce a new graphics card lineup at CES 2024. Lucky for us, many of the key details have already been leaked from multiple sources.

These new GPUs will inevitably be the highly rumored RTX 40 series Super refresh that has been leaked online and rumored for months. Nvidia's new refresh is expected to arrive with at least three brand-new Ada Lovelace RTX 40 series GPUs: the RTX 4070 Super, RTX 4070 Ti Super, and the RTX 4080 Super.

The RTX 4070 Super and RTX 4080 Super are expected to come with a noteworthy spec bump in CUDA cores over their vanilla counterparts, while the RTX 4070 Ti Super is expected to arrive with a core count jump and a big memory upgrade compared to the outgoing 4070 Ti.

The RTX 4070 Super will reportedly come with 7,168 CUDA cores, and the RTX 4080 Super will come with 10,240 CUDA cores with 24 GT/s GDDR6X memory ICs.

The RTX 4070 Ti Super is reported to feature 8,448 CUDA cores, 16GB of memory capacity, a 256-bit bus, and 22.4 GT/s GDDR6X memory chips.

https://www.tomshardware.com/pc-compone ... aked-specs

Tadasuke · Post by **Tadasuke** » Fri Jan 05, 2024 7:09 pm

According to AMD, 16 TOPs compared to 10 TOPs translate to 40% higher AI performance.

48 TOPs might perhaps translate to 100% higher AI performance compared to 16 TOPs?

For sure the amount of memory and bandwidth of that memory matters a lot.

Is Strix Point supposed to have 2x memory bus compared to Hawk Point? That would help.

YouTube · Post by **wjfox** » Tue Jan 09, 2024 8:50 am

Nvidia's H100 GPUs will consume more power than some countries — each GPU consumes 700W of power, 3.5 million are expected to be sold in the coming year

December 26, 2023

Nvidia literally sells tons of its H100 AI GPUs, and each consumes up to 700W of power, which is more than the average American household. Now that Nvidia is selling its new GPUs in high volumes for AI workloads, the aggregated power consumption of these GPUs is predicted to be as high as that of a major American city. Some quick back-of-the-envelope math also reveals that the GPUs will consume more power than in some small European countries.

The total power consumption of the data centers used for AI applications today is comparable to that of the nation of Cyprus, french firm Schneider Electric estimated back in October. But what about the power consumption of the most popular AI processors — Nvidia's H100 and A100?

Paul Churnock, the Principal Electrical Engineer of Datacenter Technical Governance and Strategy at Microsoft, believes that Nvidia's H100 GPUs will consume more power than all of the households in Phoenix, Arizona, by the end of 2024 when millions of these GPUs are deployed. However, the total power consumption will be less than larger cities, like Houston, Texas.

"This is Nvidia's H100 GPU; it has a peak power consumption of 700W," Churnock wrote in a LinkedIn post. "At a 61% annual utilization, it is equivalent to the power consumption of the average American household occupant (based on 2.51 people/household). Nvidia's estimated sales of H100 GPUs is 1.5 – 2 million H100 GPUs in 2024. Compared to residential power consumption by city, Nvidia's H100 chips would rank as the 5th largest, just behind Houston, Texas, and ahead of Phoenix, Arizona."

Indeed, at 61% annual utilization, an H100 GPU would consume approximately 3,740 kilowatt-hours (kWh) of electricity annually. Assuming that Nvidia sells 1.5 million H100 GPUs in 2023 and two million H100 GPUs in 2024, there will be 3.5 million such processors deployed by late 2024. In total, they will consume a whopping 13,091,820,000 kilowatt-hours (kWh) of electricity per year, or 13,091.82 GWh.

https://www.tomshardware.com/tech-indus ... oming-year

Image credit: Nvidia

Post by **raklian** » Tue Jan 09, 2024 4:15 pm

wjfox wrote: ↑Tue Jan 09, 2024 8:50 am Nvidia's H100 GPUs will consume more power than some countries — each GPU consumes 700W of power, 3.5 million are expected to be sold in the coming year

Hopefully it translates into more demand for renewable energy production.

weatheriscool · Post by **weatheriscool** » Mon Jan 22, 2024 4:14 pm

Intel's Arrow Lake CPUs Will Allegedly Ditch Hyper-Threading: Leak
The company's 15th Generation CPUs will be a whole different kettle of fish.
By Josh Norem January 22, 2024

https://www.extremetech.com/computing/i ... ading-leak
Intel is expected to perform another node jump in 2024 for its next-generation Arrow Lake desktop CPUs. They're slated for release later this year and will be the company's first tile-based desktop chips made on its cutting-edge 20A process. Almost everything about these chips is new, but now leaked slides indicated they'll be the first Intel CPUs not to include Hyper-Threading since the technology was first launched in the Pentium 4 era in the early 2000s.

A Twitter user with impeccable hardware leaking credentials has posted a batch of Intel slides detailing the Arrow Lake-S platform, the desktop variant. The slides were deleted from Twitter for some reason but saved by Videocardz. You will recall there was originally going to be a Meteor Lake-S for desktop, which was cancelled, and the slides say MTL-S was used as the foundation for Arrow Lake's testing.

The big news is that while Intel is keeping core counts the same, the total thread count is reduced as the P-cores are not Hyper-Threaded. This rumor existed prior to this leak, but this is the first time we've seen it in writing. However, this is pre-alpha hardware, so we'll have to wait and see what materializes later this year.

Future Timeline

GPU and CPU news and discussions

Intel Sapphire Rapids vs Emerald Rapids vs AMD Genoa and Bergamo CPUs

Moore's Law in Intel server CPUs since January 2010 + performance

Zen+ vs Zen 2 vs Zen 3 vs Zen 4 32-cores HEDT CPUs

2018 Intel laptop i9 vs newest Intel laptop i9 overall comparison

Qualcomm XR2 Gen2+ and AMD APUs

Re: GPU and CPU news and discussions

AMD APUs AI performance clarification

Re: GPU and CPU news and discussions

Re: GPU and CPU news and discussions

Re: GPU and CPU news and discussions