Jump to content
Jade Shadows: Share Bug Reports and Feedback Here! ×

Instability on recent Intel Processors


[DE]Glen
 Share

Recommended Posts

While investigating crashes in Warframe we came across a particular series that were not crashing in our code (they were crashing in nvgpucomp64.dll, a component of Nvidia drivers). After aggregating hundreds of reports from helpful players we discovered a pattern: almost all were coming from systems with 13th and 14th generation Intel processors.

nvgpucomp64.dll crashes.png

 

Luckily we found a staff member who would encounter these crashes on his home computer. Curiously, his computer at the office was fine: he was playing with the same loadout, the same customizations, with the same people, but he would only crash at home.

He wasn’t over-clocking anything and it was a new machine so there was no reason to expect problems. We tried all of the usual fixes: he got the latest Windows Updates, he updated all his drivers, he disabled all third-party overlays being injected, he tested his RAM, and by all accounts everything was fine.

We ran aggressive stress-tests on similar machines: we used scripts to repeatedly open and close various user-interface components that were mentioned in crash reports, we ran endless simulated battles between squads of NPCs, and we even we made a test that would load up random levels, teleport around quickly to a whole bunch of vantage points to exercise the graphics driver, and then move on.

Everything was fine for us and yet he kept crashing doing the most basic things like launching the game and flying to a mission.

Because the crash wasn’t in our code it was hard to guess what we could be doing wrong but as we looked over the reports we noticed that these crashes tended to occur when the graphics driver was working very hard on all CPU-cores. The penny dropped when we realized that this was a particularly power-hungry state for the processor to be in and we were reminded of a recent report from Intel that suggested that a BIOS update might help.

BIOS updates aren’t usually delivered automatically by Windows Update although they are for certain OEMS: many of our office machines get regular updates from the vendor but the person who was crashing was using a custom-build gaming rig at home – he checked and it turned out that it was running the stock BIOS from 2022 and was missing over a dozen updates including one that “replaced tweaked system power settings.”

After updating his BIOS to the latest he hasn’t crashed in nvgpucomp64.dll since and we’re optimistic that the weird crashes that only he was getting won’t be back either. We’re not positive that it was the issue described by the report linked above but we’re happy that updating the BIOS helped.

Updating the BIOS is usually a simple process but it’s not something we would normally encourage people to do – usually the advice is “if it ain’t broke don’t fix it” – however if you’re crashing playing Warframe and other games, you have a 13th or 14th generation Intel processor, and you’ve updated everything else, then it’s something to consider (check with your motherboard vendor for updates and instructions).

If you happen to be playing on an AMD CPU or aren’t lucky enough to have a recent Intel processor, don’t worry: we have a bunch of fixes for crashes unrelated to this issue coming soon – we’re just waiting to get through cert on all platforms.

 

  • Like 32
Link to comment
Share on other sites

2 minutes ago, [DE]Glen said:

If you happen to be playing on an AMD CPU or aren’t lucky enough to have a recent Intel processor, don’t worry: we have a bunch of fixes for crashes unrelated to this issue coming soon – we’re just waiting to get through cert on all platforms.

Haven't seen this issue on the AMD side, with either Zen2, Zen4 or Zen4 3D. Absolute stability on my side, and I'm probably the only person here running a 7950x3D direct-die cooled with an NH-D15, CCD1 disabled, and extreme undervolted. Don't think I've touched a UEFI update since the SoC voltage issue on 7000x3D, so ~1.5 years here.

  • Like 1
Link to comment
Share on other sites

For AMD (what I am currently using), I've experienced nothing like this. I have only experienced ultra-low 1% lows on framerates and weird RAM-based crashes.
For reference, I have a Ryzen 5950x CPU and an RTX 3090, and I am running the game with high-ultra settings—still nothing like you've encountered.

Link to comment
Share on other sites

For those who are reticent to update their BIOS for whatever reason, another potential fix for this issue is to manually cap the maximum power draw of the CPU in the BIOS to whatever is appropriate for your CPU (if you've got an adequately specced power supply, anything up to around 500W should be fine, though most CPUs will do fine at a tenth of that).

The issue most likely arises from motherboard vendors setting the maximum power for these CPUs to 4096 watts, which the CPU then attempts to draw. Since the motherboard and power supply are both incapable of providing this amount, it can result in instability.

Looking at the pie chart above, we can see that the most common crashes were on high-end K-series CPUs, such as the 13900K and 14900KF, which further supports the theory that this was caused by motherboard power limit settings, as well as the fact that these issues do not seem to be present on OEM systems (which have no reason to boost to excessively high power limits in an attempt to eke out a few more points of performance). Hope this helps someone!

  • Like 4
Link to comment
Share on other sites

2 hours ago, Razgarize said:

For AMD (what I am currently using), I've experienced nothing like this. I have only experienced ultra-low 1% lows on framerates and weird RAM-based crashes.
For reference, I have a Ryzen 5950x CPU and an RTX 3090, and I am running the game with high-ultra settings—still nothing like you've encountered.

Dual CCD Ryzen CPUs aren't great for gaming past Ryzen 3000, so R9 5000 series and later. The 1% low drops you're experiencing are either the result of the gaming spilling onto the other CCD which increases frame time latency dramatically, or a combination of that and FCLK:MCLK:UCLK instability, which are all supposed to be 1:1:1. More than likely, you're running either 1600MHz or 1800MHz, depending on what DOCP profile your RAM is running (3200MHz = 1600MHz, 3600MHZ = 1800MHz). Ryzen 5000 is perfectly stable at 1800MHz, but maybe not when jumping CCDs intermittently, since that'll operate off the FCLK to do so.

There's some scenarios where Warframe's CPU utilization spikes massively, even up to 100% on my CPU. In the case of any dual CCD Ryzen CPUs, you'd be bridging the infinity fabric to the other CCD, which is going to increase frametime latency. 

Its a major reason why I disabled CCD1 on my 7950x3D, even fully tuned to mitigate it, once you bridge from your 3D v-cache CCD0 to CCD1, the benefits of 3D v-cache are eliminated. In this case, direct-die cooling a 7950x3D with CCD1 disabled lets me operate at ~400MHz higher than a 7800x3D under load, maintaning >5.2GHz on multiple cores. With Warframe, that's running the game at 240 fps versus 160 fps, Warframe loving 3D v-cache. Same behaviors exhibited with the 5800x3D I had prior when upgrading from a 3950x.

 

TLDR: Don't use dual CCD Ryzen CPUs for gaming. Lasso or processor affinity your 5950x if you need those extra 8 cores, otherwise, just disable CCD1. The other solution would be to sell your 5950x and buy a 5700x3D/5800x3D. You might be able to trade it 1:1 even, since the 5950x is still a capable productivity/server CPU with how abundant DDR4 ECC is still.

  • Like 1
Link to comment
Share on other sites

6 hours ago, [DE]Glen said:

After aggregating hundreds of reports from helpful players we discovered a pattern: almost all were coming from systems with 13th and 14th generation Intel processors.

Funny. I have Risen.

Looks like at the newest drivers no crashes.

Or, maybe it just Matrix has me, and I really have overheated, while you

6 hours ago, [DE]Glen said:

Everything was fine for us and yet he kept crashing doing the most basic things like launching the game and flying to a mission.

Yeah, maybe it's unrelated, me crashing only on the Event mission

 

 

Edited by -JT-_-R3W1ND
Link to comment
Share on other sites

16 hours ago, Demigirlboss said:

For those who are reticent to update their BIOS for whatever reason, another potential fix for this issue is to manually cap the maximum power draw of the CPU in the BIOS to whatever is appropriate for your CPU (if you've got an adequately specced power supply, anything up to around 500W should be fine, though most CPUs will do fine at a tenth of that).

The issue most likely arises from motherboard vendors setting the maximum power for these CPUs to 4096 watts, which the CPU then attempts to draw. Since the motherboard and power supply are both incapable of providing this amount, it can result in instability.

Looking at the pie chart above, we can see that the most common crashes were on high-end K-series CPUs, such as the 13900K and 14900KF, which further supports the theory that this was caused by motherboard power limit settings, as well as the fact that these issues do not seem to be present on OEM systems (which have no reason to boost to excessively high power limits in an attempt to eke out a few more points of performance). Hope this helps someone!

Those 13000 and 14000 lines are known to even die because Intel messed up the inner power management. In its original setting, it is locally drawing more than the silicon can handle. I would strongly recommend the BIOS update, though it does have an impact on performance.

If anyone is interested in a long read: https://community.intel.com/t5/Processors/June-2024-Guidance-regarding-Intel-Core-13th-and-14th-Gen-K-KF/m-p/1607807

Edited by kadlis12
  • Like 1
Link to comment
Share on other sites

On 2024-07-09 at 11:54 AM, Agall said:

Dual CCD Ryzen CPUs aren't great for gaming past Ryzen 3000, so R9 5000 series and later. The 1% low drops you're experiencing are either the result of the gaming spilling onto the other CCD which increases frame time latency dramatically, or a combination of that and FCLK:MCLK:UCLK instability, which are all supposed to be 1:1:1. More than likely, you're running either 1600MHz or 1800MHz, depending on what DOCP profile your RAM is running (3200MHz = 1600MHz, 3600MHZ = 1800MHz). Ryzen 5000 is perfectly stable at 1800MHz, but maybe not when jumping CCDs intermittently, since that'll operate off the FCLK to do so.

There's some scenarios where Warframe's CPU utilization spikes massively, even up to 100% on my CPU. In the case of any dual CCD Ryzen CPUs, you'd be bridging the infinity fabric to the other CCD, which is going to increase frametime latency. 

Its a major reason why I disabled CCD1 on my 7950x3D, even fully tuned to mitigate it, once you bridge from your 3D v-cache CCD0 to CCD1, the benefits of 3D v-cache are eliminated. In this case, direct-die cooling a 7950x3D with CCD1 disabled lets me operate at ~400MHz higher than a 7800x3D under load, maintaning >5.2GHz on multiple cores. With Warframe, that's running the game at 240 fps versus 160 fps, Warframe loving 3D v-cache. Same behaviors exhibited with the 5800x3D I had prior when upgrading from a 3950x.

 

TLDR: Don't use dual CCD Ryzen CPUs for gaming. Lasso or processor affinity your 5950x if you need those extra 8 cores, otherwise, just disable CCD1. The other solution would be to sell your 5950x and buy a 5700x3D/5800x3D. You might be able to trade it 1:1 even, since the 5950x is still a capable productivity/server CPU with how abundant DDR4 ECC is still.

This issue only occurs when playing the DirectX 12 version of the game. I am using a multipurpose workstation-grade desktop that I built myself. It has 64 GB of RAM running at 3600MHz CL 14 and a CPU with an automated overclock averaging 4.8GHz. I built this system not only for gaming but also for animation, game development, and coding projects.

Regarding the game's performance, the DirectX 12 version runs at an average of 290 fps on high-ultra (custom) settings, with 1% lows around 60 fps. After experimenting with the game settings, I found that these changes do not affect the game's stability. When reverting to DirectX 11, the game's stability improves, with the 1% lows increasing overall, but the average fps drops to around 140 fps. I'm confident the issue is software-related, but I do not have the time to fully investigate it.

As for the original post, game crashes do occur, but there is no conclusive evidence that the cause is hardware-related on the AMD side. If I had more time to conduct more studies and in-depth tinkering with the game's stability, I could potentially find an exact fix to report to DE. However, I believe DE is likely working on moving from DirectX 11 to DirectX 12, as this has been their historical approach over the past eight years. I hope that we will see more APIs in the future for better stability.

Link to comment
Share on other sites

10 hours ago, Razgarize said:

This issue only occurs when playing the DirectX 12 version of the game. I am using a multipurpose workstation-grade desktop that I built myself. It has 64 GB of RAM running at 3600MHz CL 14 and a CPU with an automated overclock averaging 4.8GHz. I built this system not only for gaming but also for animation, game development, and coding projects.

Regarding the game's performance, the DirectX 12 version runs at an average of 290 fps on high-ultra (custom) settings, with 1% lows around 60 fps. After experimenting with the game settings, I found that these changes do not affect the game's stability. When reverting to DirectX 11, the game's stability improves, with the 1% lows increasing overall, but the average fps drops to around 140 fps. I'm confident the issue is software-related, but I do not have the time to fully investigate it.

As for the original post, game crashes do occur, but there is no conclusive evidence that the cause is hardware-related on the AMD side. If I had more time to conduct more studies and in-depth tinkering with the game's stability, I could potentially find an exact fix to report to DE. However, I believe DE is likely working on moving from DirectX 11 to DirectX 12, as this has been their historical approach over the past eight years. I hope that we will see more APIs in the future for better stability.

I would try adjusting Processor Affinity, it works live as you change it. You get to it with Task Manager > details > right click Warframe.exe > Set Affinity, then limit the game to cores 0-15 to restrict it to CCD0. I could see DX12 using more than 8c/16t and bridging to CCD1.

If that doesn't resolve it, then you probably just need to reinstall chipset drivers from AMD.com.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...