Semiconductor world - CPU/GPU Wars

kenshin · September 8, 2022, 5:31pm

One small discussion making rounds → AMD’s processor efficiency (which was showcased in its fastest supercomputer and is being seen in current 6000 series) in upcoming processors, makes it an attractive product for energy starved world right now.

The TDP values at the bottom right needs to be noted. Everyone is eagerly waiting for the performance number Q1 2023.

Intel’s processors are not efficient like AMD’s

kenshin · September 23, 2022, 1:28pm

kenshin · October 4, 2022, 1:48pm

GPU/AI:
While AMD is putting in resources to better their software (They never had the money before to do this), we are now seeing large companies with money working on software to easily switch out of CUDA.

Software has become a key battleground for chipmakers seeking to build up an ecosystem of developers to use their chips. Nvidia’s CUDA platform has been the most popular so far for artificial intelligence work.

However, once developers tailor their code for Nvidia chips, it is difficult to run it on graphics processing units, or GPUs, from Nvidia competitors like AMD. Meta said the software is designed to easily swap between chips without being locked in.

https://www.reuters.com/technology/meta-launches-ai-software-tools-help-speed-up-work-blog-2022-10-03/

Sudhakar_Subramanian · October 20, 2022, 5:39am

Nvidia, AMD, Intel, Apple and Samsung are mentioned in this article as the ones that will suffer.

kenshin · October 20, 2022, 3:23pm

It is quiet clear that the market leaders are going to get hit. It is unsure about how underdogs will fare. AMD’s Lisa Su has maintained no significant impact on their biz due to these regulations short term. Most likely because they are only catching up now. AMD CEO Lisa Su says US chip export ban won't hurt her company in the short term | Fortune (paywall, could not find another link… will add here when I find it).

Future TAM will definitely be affected and it will need some tweaking by every CPUGPU company.

kenshin · October 20, 2022, 3:47pm

Edit: Removing a chart from here. The youtuber has removed it stating bad software setting used for intel test.

Thw latest raptor lake (retail) CPUs is comparable to/better than current AMD processors.
Though the retail market is not hot and AMD is focusing on datacenter and laptops, they have given a good product overall. They are launching an armada of laptop processors next year. There is not much expectation since some OEMs in laptop markets are tightly controlled by intel and not to mention lack of demand. Folks like dell just will not put non intel in their high end laptops.

Everyone is waiting for the earnings call.

My prediction for AMD is that with their chiplet architecture and higher effeciency (Refer point 5 for TCO), they have very likely moved fab capacity to cater to data center/cloud market. Why? Multiple news of cloud/DC not slowing down. https://www.cnbc.com/2022/10/12/microsoft-cloud-boss-scott-guthrie-customers-not-cutting-spending.html. Cost cutting playing right into AMD’s hands.
People are moving to cloud aggressively (Example: FedEx).
Chiplets - As of now, only AMD can move capacity like this. Their CPU/GPU has mix of nodes (5nm and 6nm). And when they see markets down in one sector they can move a part of the capacity in one of the nodes to another to cater to another sector where demand is present. This is unique to AMD as of now.
Datacenter market : Remember this is what the thread is all about. Forrest Norrod mentioned in goldman sachs QnA that they are not limited by CPU supply but substrate supply and a lot of that will come online in Q1 2023 (Advanced Micro Devices, Inc. (AMD) Goldman Sachs Communacopia + Technology Conference (Transcript) | Seeking Alpha).
DC CPU Competition 2023 - look at the following picture

Genoa marks a big shift in TCO that makes it sensible to replace aging servers. 2S (socket) Genoa offers 4x the general-purpose performance at significantly better TCO versus 2S Skylake/Cascade Lake SP server. Initial capital expenditures for Genoa-based servers are considerably higher due to the cost of higher costs of the CPU, DDR5, and PCIe 5.0. Despite this big cost jump, Genoa and Bergamo-based servers will pay for themselves many times over versus keeping already depreciated servers deployed.

Under this oversimplified model, upgrading to a 2-socket Genoa-based server from 4 existing 2-socket Skylake/Cascade Lake-based servers (2 CPUs vs 8 CPUs) is a net present value positive transaction. The payback period for Capex spent is roughly ~18 months. The payback period for a Rome/Milan server upgrade would still be ~4 years. The improvements are even more significant when you start considering new features related to security, CXL, and AVX512.

Waiting for earnings call… to confirm theory

kenshin · November 5, 2022, 9:26am

CPU Data Center story continues along predicted trajectory.

Intel Q3

AMD Q3

Cannot confirm as of now if there is more interest in AMD CPU due to its low TCO.

kenshin · November 6, 2022, 5:14am

We see that DC CPU story is progressing as was being predicted from 2020.

Update on GPU

AMD is targeting the next behemoth, nvidia since last year. Performance wise, AMD hardware is able to match nvidia hardware as we saw last gen. Looking at the overclocking being done with 6800xt, I think that AMD can beat NVDA Titan cards but they do not want to do that fight… not just yet.

But what happened this gen? You will usually read that nvidia still has top performance cards. But that is coming at a cost.

Their GPU takes massive wafer space - 608/628mm^2 for 4090. - The reticle limit is 800mm^2. What this does is that it reduces yields compared to smaller chips.Die Yield Calculator - iSine. Note that this include their graphics and IO IP.
The larger chip also limits nvidia’s ability to pack more shaders (say compute units). They can pack only so much before they hit the reticle limit. Compared to this, AMD top is at ~300mm^2. They can always double their shader count by making large GCD (graphics complex die). They have chiplets… So they keep IO out of GCD and put it in cheaper older node (6nm for 7xxx series) while GCD is in newer nodes (5nm in 7xxx series). But there is no space left for nvidia except for hoping for node shrink. Next gen is going to be when we clearly see nvidia is facing the death of moore’s law their CEO is talking about. https://venturebeat.com/games/jensen-huang-qa-why-moores-law-is-dead-and-smart-design-is-replacing-it/. But AMD is not that affected since they went chiplets.
For nvidia, the desperation for mindshare “fastest GPU” is visible with them pushing the power requirements.

AMD refused to chase prices set by nvidia nor performance set by nvidia. They could have easily pushed clock (and hence power) and matched nvidia performance but thye instead chose to cut their own path. They kept the GPU prices at 1000$/900$ (compared to 1600$/1300$ nvidia) and still come close to nvidia raster performance (raytracing is another story). Raster is majority market. That is their claim. We will see when the cards come in December or third party review.

That is the current status in hardware. nvidia pushing power to keep mindshare + amd refusing to play that game as of now. So unless nvidia next gen (Blackwell) is chiplets, nvidia is going to face what happened to intel in hardware. Offcourse, customers prefer nvidia for their sofware platform. Huge advantage there. So the job is cut out for both teams. nvidia needs to figure out how to do chiplets and amd has to figure out how to do great software. From that end, the following info is noteworthy

AMD is using 6000 unique systems configurations for graphics driver testing, 1500 more than NVIDIA

https://videocardz.com/newz/amd-has-rebuilt-its-directx11-driver-from-the-ground-up-10-better-performance-on-average

I am already seeing better reviews of AMD drivers. Stability has improved. But… But… the real story is in the datacenter where nvidia is king. CUDA.

As of now, AMD is not targeting the cloud/data center server market at all. They are targeting only supercomputers/hyperscalers. Working with them to improve their software ecosystem (Example: Facebook Parent Meta Launches AI Software Tools to Ease Switching Between Nvidia, AMD Chips | Technology News). This reminds me of the nvidia of 2009-2013 where they worked with many domain experts in industry to develop what is now CUDA.

As of now, it is visible where AMD is at. Where are nvidia chiplets? Yet to be seen.

Sudhakar_Subramanian · November 6, 2022, 4:01pm

kenshin · November 7, 2022, 3:06am

All arm based

Sudhakar_Subramanian · November 7, 2022, 3:16am

Qualcomm is in legal tussle with ARM on nuvia design. This is the drag as of now.

kenshin · November 7, 2022, 4:23am

Regarding alibaba ARM cloud, I am eagerly waiting for AMD Bergamo zen4c 128 cores tailored for cloud. We are also seeing efficiency cores from intel. This will very likely make it to cloud customers. So, we need to wait on these cores to come and compete and then, only then see how ARM does against a legitimate competition in the area that ARM server processor designers are targeting.

Also, a comment about ARM ISA from victor peng… AMD Replay - Tech Summit 2022: The Age of AI Scaling | Equity Research @ 28:40 . I am transcribing it for everyone here so that the ARM hype is in check.

Mosesmann - I used to cover ARM back when they were public. I recall an executive at ARM had said that all this being equal if we had an ARM processor at the same process node because of the efficiency of the architecture we can use 1/3 of the transistors than an x86 processor would use, I dont’t know if you can comment on that because it is more like an x86 question and not your area historically but …
Peng - yeah… well hans you know what i would say is okay, I guess… you know I started my career as a microprocessor designer . My first program was a VAX mini digital computer so that shows you my age ( ) , I have done VAXs, I have done MIPS (VP of engineering), SGI, We have done multiple generations of ARM now at xilinx we do ARM SOCs and now (I am) with a company that is x86. So what I would say is that, that is technically not accurate (He is laughing a bit here). Modern architectures have a lot of commonality, I am not saying there aren’t some differences with these instructions and architectures but you know that claim of factors like that is simply not true right… When you target certain things like the ultimate in single threaded performance that leads you to certain architectural choices. If you are not targeting the ultimate in single threaded performance and you are targeting something else like say a mobile handsets or something where you care a lot about power you have different architectural choices. I think there is a lot more about architecture and implementation choices as opposed to what is inherent in the instruction set architecture… I have done a lot of architecture in my time but I think that is tremendously exaggerated.

Jim Keller had something similar to say about ARM vs x86. His talk is available now in YouTube

RISC vs CISC: Instruction sets don’t matter | Jim Keller and Lex Fridman
Jim Keller: Arm vs x86 vs RISC-V - Does it Matter?

All I will say is ARM is making incursions, but we do not know how much of it is due to the goof up that intel has done. It could very well be that new server entrants will learn some old lessons from intel. Having your own fab is a different thing altogether. Another factor coming in is geopolitics… so… we can only observe how ARM progresses as of now and set our expectations at the right level. Because it is only now in 2023 that AMD and Intel (may be) will give out cores targetting specifically for cloud customers.

kenshin · November 10, 2022, 3:32pm

Media blitz on the day AMD reveals genoa. It can get ugly… the competition

kenshin · November 11, 2022, 5:15am

EPYC Genoa Launch Event
So AMD got msft/vmware/semi/HP/Dell/Lenovo/supermicro/google cloud/vmware to talk about the benefits of genoa/EPYC platforms in terms of power/energy savings/performance.
→ together we advance_data centers

They really doubled down on capex and opex. There was stress on TCO across these clients.

New Platform: Very likely a 5 year plan. MSFT said you get upgraded to genoa-x (the 3d cache version) when it arrives. Zen5/Zen6 are going to be drop in replacement. Platform stays. Intel has nothing close to this.

image1269×662 39.3 KB
Performance was as expected - better

Effeciency/TCO (Total Cost of Operations) was the real deal

→ Migration is easier said than done though. Because it is not just the processor you change when moving out of Intel. You have other peripheral costs added in. But this platform gives a compelling reason to ask why not (as mentioned in the presentation by a client). Example: “what can be done with intel can be done with 1/3rd the servers at 50% less power and that combines 40% capex and 61% opex reduction/year”.

Bergamo was known. Look at the right corner. Xilinx impact on TAM. Edge server solutions/Telecom/5G

image1044×361 66.5 KB
It is available for order right away at multiple enterprise server vendors. Very likely was shipping to select customer last Q itself.
It was available for reviewers and we have videos like this 192 Cores of EPYC Domination!

https://www.phoronix.com/review/amd-epyc-9654-9554-benchmarks

The generational uplift from Milan to Genoa was incredible across the wide-range of server and HPC benchmarks I’ve carried out. I am now left to daydream about what Genoa-X will look like next year in knowing there still is even more potential to squeeze out of Zen 4 on the server side as well as next year’s Bergamo CPUs for up to 128 cores for focused on cloud computing workloads.

https://semiaccurate.com/2022/11/10/amd-launches-their-96-core-genoa-server-cpu/
Summary:

Overall though, on a raw performance basis, Genoa is a clean kill. Nothing comes close, that 50% core count increase is simply crushing to anything out there and on that basis is nearly as massive a generational change as the Naples to Rome jump. The competition has no answer for years.

kenshin · November 11, 2022, 1:16pm

I stumbled on this article. A view into the the scale of datacenters.

Compass Data Centers, Prince William County: In June, Compass Datacenters filed plans to build up to 10.5 million square feet of data center capacity in the Prince William Digital Gateway, a proposed 2,100-acre technology corridor in Manassas, Virginia which could accommodate up to 27 million square feet of data center development. Compass seeks to rezone 825 acres of land for its project. The Digital Gateway is controversial because it is adjacent to one of the Manassas Civil War battlefields and a state forest, but last week passed a key milestone when the Prince William Planning Commission recommended the project be approved by the Prince William Board of Supervisors, which is expected to review it next month.

kenshin · November 14, 2022, 6:52am

Wonderful interview of former Global Foundaries (fab company like TSMC) VP about current status of Intel/AMD/NVIDIA. Near past, present and near future.

note: He details how he underestimated benefits of chiplets and how the economics of chiplets is a big advantage.

kenshin · November 17, 2022, 3:03am

The results from Larabel show that the EPYC 9374F came very close to matching the Intel Xeon Platinum 8380 2P in single-core tests. However, in multi-core workloads, the results shine for AMD. This is made even better by the fact that the single-chip has 32-cores on the 1P platform and was running against two Xeon Platinum 8480 chips with a combined total of 80 cores and 160 threads., in power consumption, at a fantastic 327.56W. In contrast, the Intel Xeon 8380 2P maxed out at 583.63W. The difference is roughly 1.5 times in favor of a single AMD EPYC 9374F compared to two Ice Lake Intel Xeon 8380 2P processors.

Sudhakar_Subramanian · November 17, 2022, 12:08pm

Qualcomm announces ‘Oryon’ next-gen ARM CPU, Citi adopting Snapdragon PCs

"Citi announces the transition of more than 70% of its 300, 000 global users to Snapdragon mobile PCs, and Oryon as a next-gen CPU core.

At the annual Qualcomm Summit, the company revealed the name of its next-gen Nuvia-based CPU called Oryon.

No other details about Oryon were given, as the processors are not expected to ship until early 2024.

Citi announced that it is transitioning a large segment of its global users to Qualcomm-based computers including the new ThinkPad X13s.

Adobe said more key Adobe Creative Cloud applications are coming to Snapdragon-based computers in 2023."

kenshin · November 21, 2022, 9:23am

Wonderful read on future datacenter architectures and how AMD is at the intersection of AI and Datacenter that is coming our way. The author is holding AMD shares.

kenshin · November 22, 2022, 9:26am

Advantages of chiplets explained perfectly with real examples from AMD Fellow Sam Naffziger

New information for me are

Porting IO IPs to new node is engineering intensive - So with chiplet they can avoid Moving IO to new node. Why? Point 2 below
Logic (CPU/GPU) scales well with node size but other IPs do not. So with chiplets, they have the option to move only CPU to new node while keeping IO in a separate die.

image1261×617 60.2 KB

Hence Modern CPUs will look like this underneath the package

They did it even with GPUs. Arround 2017 Intel downplayed chiplets. Now everyone wants one.