GPU and CPU news and discussions

weatheriscool
Posts: 24514
Joined: Sun May 16, 2021 6:16 pm
Contact:

Re: GPU and CPU news and discussions

Post by weatheriscool »


Nvidia Blackwell B200 Chip is 4X Faster than the H100 – 1 Exaflop in a Rack

March 18, 2024 by Brian Wang
https://www.nextbigfuture.com/2024/03/n ... -h100.html
The NVIDIA Blackwell platform was announced today. It will run real-time generative AI on trillion-parameter large language models at up to 25x less cost and energy consumption than the H100.

The Blackwell GPU architecture has six transformative technologies for accelerated computing, which will help unlock breakthroughs in data processing, engineering simulation, electronic design automation, computer-aided drug design, quantum computing and generative AI — all emerging industry opportunities for NVIDIA.

Blackwell will be used Amazon Web Services, Dell Technologies, Google, Meta, Microsoft, OpenAI, Oracle, Tesla and xAI and many more.
weatheriscool
Posts: 24514
Joined: Sun May 16, 2021 6:16 pm
Contact:

Re: GPU and CPU news and discussions

Post by weatheriscool »

Qualcomm Announces Snapdragon 8s Gen 3 for Cheaper Flagship Phones
The Snapdragon 8s Gen 3 combines features of the Snapdragon 8 Gen 3 and Gen 2.
By Ryan Whitwam March 19, 2024
As expected, Qualcomm has revealed a new Arm chip for smartphones in the Snapdragon 8 family. However, the latest Snapdragon 8s Gen 3 is not a successor to the Snapdragon 8 Gen 3—yes, it's another instance of Qualcomm's famously confusing model names. Instead, you'll see this chip in "budget flagship" smartphones that cost less than the top-of-the-line units. However, the new part still has most of the high-end features you'd expect from Snapdragon 8 chips.
https://www.extremetech.com/mobile/qual ... hip-phones
Qualcomm's Snapdragon 8 Gen 2 and Gen 3 use TSMC's 4nm process node, and the 8s Gen 3 is no different. This architectural consistency helped to integrate features of the Gen 2 and Gen 3 to make a chip that offers good performance at a lower price.
weatheriscool
Posts: 24514
Joined: Sun May 16, 2021 6:16 pm
Contact:

Re: GPU and CPU news and discussions

Post by weatheriscool »

Samsung Shows Off 32Gb/s GDDR7 Memory Modules at Nvidia GTC
The company displayed its latest wares at Nvidia's GPU Technology Conference this week, hinting at a partnership for the RTX 50-series.
By Josh Norem March 20, 2024
https://www.extremetech.com/gaming/sams ... nvidia-gtc
Nvidia's GTC conference this week was all about AI and included the announcement of its next-generation Blackwell architecture. The data center hardware unveiled at the show uses high-bandwidth memory (HBM), so it's not too surprising to hear Samsung was at the event as well since it makes that kind of memory. However, it is now being reported that the company was also showing off its upcoming GDDR7 modules for gaming GPUs. The trade show sighting gives us the first concrete info on the speeds that may be offered when the new memory arrives in GPUs this year.

German website Hardwareluxx was at the show wandering around and spotted a previously unnoticed display at the Samsung booth touting the benefits of its GDDR7 memory solution. What makes this interesting is there's been some confusion about the specs of the first GDDR7 modules that will come to market, as we've heard about a range of offerings spanning 28Gb/s up to 36Gb/s. The placard at the Samsung booth states it's offering 2GB modules running at 32Gb/s, which could be what we see coming to future GPUs from both Nvidia and AMD. However, Hardwareluxx reports that even though 32Gb/s modules were on display, Nvidia will opt for 28Gb/s chips for the RTX 50 series to assist with efficiency.
Tadasuke

🔥 NVIDIA BLACKWELL 🔥

Post by Tadasuke »

Seems like progress in hardware for AI, AI and robotics is speeding up.



Image

Image

Image

Image
User avatar
wjfox
Site Admin
Posts: 13588
Joined: Sat May 15, 2021 6:09 pm
Location: Essex, UK
Contact:

Re: GPU and CPU news and discussions

Post by wjfox »

208 billion transistors... insane.

Try to imagine about half of the entire Milky Way galaxy. Each star represents a transistor.

And that's just one chip, never mind combining them into clusters.
firestar464
Posts: 7219
Joined: Wed Oct 12, 2022 7:45 am

Re: GPU and CPU news and discussions

Post by firestar464 »

trying to imagine that reproduced with vacuum tubes lmao like in fallout
User avatar
Powers
Posts: 1183
Joined: Fri Apr 07, 2023 7:32 pm
Location: a.k.a Lurking, Member, Lorem Ipsum, ..., --- and ººº.

Re: GPU and CPU news and discussions

Post by Powers »

firestar464 wrote: Thu Mar 21, 2024 2:44 pm trying to imagine that reproduced with vacuum tubes lmao like in fallout
I imagine whole planets covered in them.
Tadasuke

RTX 4060 vs GTX 1060

Post by Tadasuke »

Despite what Jensen Huang wants us to believe, performance for your everyday person hasn't really changed that much in recent years.

4060 is 135% faster in 1080p and 145% in 1440p gaming than 1060 from 2016. Prices are about the same when counting inflation. There is 33% more memory on the 4060.

Image

source: https://www.pcmag.com/articles/nvidia-g ... to-upgrade
Tadasuke

Blackwell transistor count

Post by Tadasuke »

wjfox wrote: Thu Mar 21, 2024 2:35 pm 208 billion transistors

And that's just one chip, never mind combining them into clusters.
Actually one chip is 104 billion transistors (still not a small number). Built on the 4nm TSMC production process. Two chips combined into one are 208 BT, a bit like AMD Zen, which also combines "chiplets" working together as one. Intel uses tiles for their current server chips ("Emerald Rapids"). Each tile is 32 cores (actually 33 with 1 disabled). Two tiles are 64 cores 128 threads.
firestar464
Posts: 7219
Joined: Wed Oct 12, 2022 7:45 am

Re: RTX 4060 vs GTX 1060

Post by firestar464 »

Tadasuke wrote: Fri Mar 22, 2024 10:03 am Despite what Jensen Huang wants us to believe, performance for your everyday person hasn't really changed that much in recent years.

4060 is 135% faster in 1080p and 145% in 1440p gaming than 1060 from 2016. Prices are about the same when counting inflation. There is 33% more memory on the 4060.

Image

source: https://www.pcmag.com/articles/nvidia-g ... to-upgrade
For the everyday person, yes. Meanwhile Big Tech is relishing the computing power
Tadasuke

GPU prices, Blackwell FP64

Post by Tadasuke »

firestar464 wrote: Fri Mar 22, 2024 12:49 pm For the everyday person, yes. Meanwhile Big Tech is relishing the computing power
I used to think that $600 for a single GPU was a lot. 😅

And by the way, Blackwell has only 32% faster FP64 compute than Hopper (40 vs 30 teraflops). One full-spec B200 (208 billion transistors) uses 1200 watts at 100% usage. One B100 (104 billion transistors) uses up to 700 watts at 100% usage.
User avatar
wjfox
Site Admin
Posts: 13588
Joined: Sat May 15, 2021 6:09 pm
Location: Essex, UK
Contact:

Re: GPU prices, Blackwell FP64

Post by wjfox »

Tadasuke wrote: Fri Mar 22, 2024 4:31 pm
And by the way, Blackwell has only 32% faster FP64 compute than Hopper (40 vs 30 teraflops).
But for AI, lower precision (i.e. FP8, FP4) is better. Blackwell is a 5x improvement on this.
Tadasuke

simulations, transistor count vs relative performance

Post by Tadasuke »

Yeah, that's true, I agree. Makes no sense to use FP64 in machine learning or in (so called) inference. And various useful and practicable simulations could be also (more efficiently) done with AI (instead of directly simulating in costly very high precision), which has been proven to work. Protein folding or weather simulations for example (like what Jensen Huang showed on a stage with 10x better precision weather simulation using Nvidia Blackwell).

The $1000 hexa-core i7-980 (or i7-980X, which is the same processor) from 2010 has 1.17 billion transistors (BT). The $2000 eighteen-core i9-7980XE from 2017 has 7 BT and the $3000 twenty-eight-core Xeon W-3175X has 10 BT.

When you look at the most multi-threaded use-cases, which can well utilise 100% of all cores, threads and instructions of these processors, the i9-7980XE is 6x faster than the i7-980(X) and the Xeon Workstation 3175X is 8.6x faster than the i7-980(X).

Those increases in total performance can be also attributed to the increases in the total transistor count of those CPUs. Of course, it all had to be properly arranged and designed, first by engineers in specialised computer programs, then a mask had to be created and then the actual CPUs had to be cleanly "etched" using appropriate chemicals and photolitography on purified silicon wafers, in well-equipped and well-staffed foundries (semiconductor fabrication plants) around the world. :-)
weatheriscool
Posts: 24514
Joined: Sun May 16, 2021 6:16 pm
Contact:

Re: GPU and CPU news and discussions

Post by weatheriscool »

TSMC's 3nm Business to Surge in 2024 Thanks to Apple, AMD, and Intel
Nvidia is notably missing from this list of big clients.
By Josh Norem March 27, 2024
TSMC launched its 3nm process at the end of 2022 and then spent all of 2023 making 3nm chips only for Apple. That will change in 2024, as the company begins cranking out 3nm products for its other customers, including Intel and AMD. Considering the size of these orders and the companies involved, it means 2024 will be a year where we'll likely see TSMC 3nm products flooding the high-end tech world, with one company notably missing from the roster—Nvidia.

The Taiwan Economic Daily is reporting on TSMC's surging 3nm business, and there's an economic angle to it. Last year, TSMC's 3nm node was just a small, niche process for the company with one big customer (Apple), but in 2024, it's expected to do gangbusters business. It will reportedly be big enough to land it right behind its most popular node—5nm—in terms of overall revenue. The site says its 3nm process might be responsible for up to 20% of its revenue this year, making it the second most popular node.
https://www.extremetech.com/computing/t ... -and-intel
weatheriscool
Posts: 24514
Joined: Sun May 16, 2021 6:16 pm
Contact:

Re: GPU and CPU news and discussions

Post by weatheriscool »

Intel Lunar Lake CPU Appears With Embedded LPDDR5X Memory
The compute tile is also made by TSMC, according to this (usually reliable) source.
By Josh Norem March 28, 2024
Intel's Lunar Lake architecture will be the low-power follow-up to its current Meteor Lake platform for mobile devices. Due later this year, this new architecture will target ultra-portable laptops, handhelds, tablets, and similar hardware. It's now been photographed for the first time, showing it's a whole different kettle of fish for Intel, with several huge first-time advancements for the company.

German site Igor's Lab received a photo of an engineering sample of Lunar Lake and some engineering slides, revealing it will be Intel's first CPU with DDR memory on the package, similar to Apple's M-series chips. You may recall that Intel released a photograph of Meteor Lake with on-package memory late last year and then deleted it. However, the CPU in the picture looked the same as the one we're looking at today, so maybe that was Lunar Lake all along. Regardless, Igor confirms this CPU features a 4+4 design, which was also previously rumored, so it'll have four Lion Cove P-cores and four Skymont E-cores.
https://www.extremetech.com/computing/i ... r5x-memory
User avatar
wjfox
Site Admin
Posts: 13588
Joined: Sat May 15, 2021 6:09 pm
Location: Essex, UK
Contact:

Re: GPU and CPU news and discussions

Post by wjfox »

Tadasuke

--------------------------------------------------------------

Post by Tadasuke »

I made a long post, but I changed my mind (again).

To be honest, I dunno, I don't know the truth. It's beyond me to accurately ascertain or confirm. I'm not qualified for this.

Maybe this list can be helpful in comparing performance for GPUs: https://benchmarks.ul.com/compare/best-gpus
Last edited by Tadasuke on Sun Mar 31, 2024 7:37 am, edited 2 times in total.
firestar464
Posts: 7219
Joined: Wed Oct 12, 2022 7:45 am

Re: GPU and CPU news and discussions

Post by firestar464 »

I can see the spaces just fine.
weatheriscool
Posts: 24514
Joined: Sun May 16, 2021 6:16 pm
Contact:

Re: GPU and CPU news and discussions

Post by weatheriscool »

TSMC: One Trillion Transistor GPUs Will Be Possible in a Decade
3D-stacking chiplets will be the standard to increase compute power going forward.
By Josh Norem April 2, 2024
https://www.extremetech.com/computing/t ... n-a-decade
Things are about to get interesting in the semiconductor world, according to a lengthy missive penned by some executives at TSMC. The world's largest chipmaker is embarking on a journey that it says will result in a GPU with one trillion transistors—roughly 10 times the amount found in today's biggest chips—though it will take the Taiwanese company a decade to get there.

TSMC chairman Mark Liu and chief scientist H.-S Philip Wong have penned an editorial for IEEE Spectrum outlining their thoughts on the future of semiconductors. The headline is how the company plans to create a one trillion transistor GPU. The article details how the AI boom is currently the main driver for increased compute power in chips, especially GPUs. It notes that as we reach the end of the traditional node-shrink era, the way forward is clear: chiplets and 3D stacking.
Tadasuke

performance scaling across cores and threads

Post by Tadasuke »

Real-world performance typically doesn't scale linearly with cores and threads.

From what I've seen, with constant frequency, same cache size per thread and same architecture like Zen 4, performance increases in various advanced, professional, useful workloads stack up approximately like this:

▪️ 4 cores are 4x faster than 1
▪️ 12 cores are 10x faster than 1
▪️ 24 cores are 16x faster than 1
▪️ 32 cores are 20x faster than 1
▪️ 64 cores are 27x faster than 1
▪️ 96 cores are 32.5x faster than 1

Of course, it doesn't look like that with everything. It's just an example, to illustrate how it can/might be. Higher clocks in lower corecounts (compared with higher corecounts) only make this non-linearity even more obvious.

Will 320, 384 or 512 cores be any useful, practical or economical outside some edge cases? :? Theoretical flops/ops or Linpack flops ≠ actual performance (in finances, science and engineering for example). For AI there will be specialized hardware like there already increasingly is.

I'm always curious what the actual performance increases really are. Like, I currently think that Radeon 7800 XT is 12x faster than HD 5870 from Q3 2009, for about the same price when adjusted for inflation. :⁠-⁠)

By the way, Zen 4 can be even twice as fast as Zen 3 and Intel Sapphire Rapids (with the same number of cores and threads), largely thanks to its very decent architecture, large L3 cache, fast FPU and high memory bandwidth (333 GB/s for Threadripper and 461 GB/s for Epyc). Emerald Rapids is probably about 30% faster than Sapphire Rapids, so it's still behind AMD.
Post Reply