Forum Discussion
Please keep me posted.
And you were right. I had three hours of stability, but then suddenly the "internal parity errors" started happening again, and not long after, the random crashes.
The only fix was to increase CPU voltage -excessively- high.
Oriostorm, is Apex doing something timing related that is causing this instability?
Because it really should not be happening.
I set my cpu to 5.2 ghz (hyperthreading off) at 1.335v and I ran Battlefield 5 (Firestorm) for an hour. BF5 was using >75% of the CPU cores, temps and power draw were much higher, and there were no errors or crashes. I then ran the Blender Classroom render stress test (you can google this), and blender BMW stress test and both completed with flying colors. The only programs that had problems at this point were Apex Legends (those crashes and Internal Parity Error) and Prime95 AVX 1344K (clock watchdog timeout BSOD). Prime95 small FFT with AVX disabled ran forever. (Prime95 29.8 build 1).
Increasing CPU voltage to 1.385v completely stopped the Prime95 1344K AVX tests from crashing. These ran fine now.
But APEX legends was still randomly acting up (exception error, or CTD sometimes with no error, or internal parity error (no crash).
1.395v-1.40v was needed to stop this--far far beyond what any other program or application needed! Something is really strange here.
And what's even worse---Apex Legends uses FAR FAR far less CPU resources than Blender, Battlefield 5, Prime95 (AVX disabled), etc.
Oriostorm, is there any way, if you have time, to write a very small 'stress test' code sample that can extensively test some of the instructions you are using for Apex Legends, in a repeated intense loop of cycles, so that we can run the executable program and it can catch any SSE2 error in a bugtrap? This may help track down the problem. This might also help determine why some users with stock clocks are also getting these crashes (in most cases, pure stock clocks prevent these problems). It shouldn't take that long to write a code sample that we can download, and it may help determine what's going on.
It really is inconceivable that Apex is needing far more voltage than something like Blender or Prime95....
Could EasyAntiCheat be causing these timing errors?
By the way, this "bizarre" behavior seems to get reversed (meaning things start acting properly) if you run at base clocks (4.7 ghz) and then downvolt the CPU enough so it's unstable.
Doing this and Apex Legends runs nice and happily while Prime95 with AVX disabled generates a BSOD crash. This happened at about 4.7 ghz (HT enabled) at 1.065v.
You have to go even lower on the voltage before Apex starts generating parity errors.
@Falkentyne, I honestly don't know how Easy AntiCheat works (I wasn't involved in integrating that), so I can't speak to whether it's a factor.
I do think it's related to the actual instruction sequence, and possibly it's offset in memory. I can try making a standalone test program, that's a good idea. But it's not as trivial as you might think. This function does connect with the rest of our engine, so we have to sever those connections to make a standalone program, but we have to do it in a way that doesn't change the generated assembly.
If that doesn't work, I guess I could write an assembly language file that just has this function exactly as it is in the live game, and try to have external code feed it the data expects in a way that won't crash.
Another tricky thing is that the path through this code depends on branches based on what you can see in the game. If a particular path through the code is causing the bug, you'd need to replicate the live data that causes the crash. That data can be different for every viewpoint in King's Canyon, so even replicating all of the code may not replicate the crash if we don't replicate the control flow decisions. There are things we can try to force different control flows, but it's not guaranteed to repro.
Still, it's worth a shot!
- 7 years ago
@OrioStorm Thank you very much. I would be happy to do these tests for you (after all it's what we people over on overclock.net do, spend all day running stability tests!).
I just did something very interesting on my laptop.
It is a 7820HK (MSI) laptop, and I set it to a speed which I know is unstable:
4800 mhz (cache: 4400 mhz), and 1.260v.
This laptop does not have loadline calibration so there is voltage vdroop, but there is no "live" Vcore sensor, just VID, so real vdroop is impossible to measure.
Anyway:
This laptop is highly overclocked to its max absolute limits.At 4800 mhz, 1.260v, Apex Legends generated three CPU errors but did not crash:
1 x Internal Parity Error on thread #4
2x Cache Hierarchy Errors on threads #6 and #3. (the threads go from 0 to 7, one physical core+1 logical core, so threads 0 and 1 are for the first core, etc).
This was a 1 hour play session.
This is exactly what I would expect an unstable CPU to do.
These cache hierarchy errors are also what Realbench 2.56 and Prime95 (AVX disabled) do on my desktop when they are unstable. So this is properly expected.
Apex Legends did NOT crash however! But the fact that I got *both* internal parity and Cache L0 errors (instruction register corruption basically)--but they were corrected, means that the "Unstable" CPU is actually acting as expected.
1.265v did not generate anything but I didn't test that long enough.
The 9900K desktop, when overclocked is performing completely different however, although desktops do have a loadline calibration setting to lower CPU vdroop.
Instead of throwing any L0 cache errors in Apex, it just throws out Internal Parity Errors, or just exception error crashes Apex, while every other stress test or program (like more intensive stuff like Battlefield 5, Blender, etc) runs all day (except small FFT AVX prime95. which is a 200 amp 100C temp power virus; 1344K AVX prime is more realistic). Small FFT prime with AVX disabled also runs fine.
My gut is screaming "Microcode bug" but unfortunately, disabling loadline calibration on the desktop wouldn't work too well. as then you would need unsafe voltages to be stable with <>200mv of voltage droop. I may try disabling 4 cores (e.g. turn my 9900K into a 7700K) and then overclock and disable LLC (Loadline calibration) later, and test, but this is very time consuming.
I would enjoy doing any small code testing for you. If you have a more convenient way I can contact you or where you can post anything to test, let me know. I'm always available.
- 7 years ago
I strongly advise everyone in here to revert their overclocking CPU settings back to default,
and disabling Intel SpeedstepTM in their BIOS.
I am pretty sure this has solved my problem as no longer do I have these crashes with logs.
Before it used to be all the time, now I can go for days without a single crash.
As I read here, since the problem was CPU related I tried different things.
But as mentioned over, im 99% this has solved my problem completely.
im using 7700k and 1080 ti, if any others at least try stock clock speeds and disabling Intel Speedstep,
it would be very interesting to see if the crashes are actually stopping.
- 7 years ago
My crashing stopped as well, I'm using an i9-9900K. I was stable at 4.7GHz (Turbo Boost) so I just introduced overclocking back and went to 4.8GHz to see if I'm stable. I'll report back soon.
- 7 years ago
crash:
{
atidxx64: 0000000000760244
EXCEPTION_ACCESS_VIOLATION(unknown): 0000000000000000
}
cpu: "AMD FX(tm)-8350 Eight-Core Processor "
ram: 16 // GB
callstack:
{
kernel32: 000000000009B9B0
ntdll: 0000000000079015
ntdll: 0000000000057388
ntdll: 000000000006BF7D
ntdll: 000000000004043A
ntdll: 000000000006B61E
atidxx64: 0000000000760244
atidxx64: 0000000000032617
atidxx64: 000000000072DAAF
atidxx64: 0000000000722239
atidxx64: 00000000008BBFB4
kernel32: 000000000001556D
ntdll: 000000000005385D
}
registers:
{
rax = 16
rbx = 34
rcx = 0x62A86B40
rdx = 2
rsp = 0x4204FB20
rbp = 0xB4BA0B12B50D3CF6
rsi = 0x62A86B40
rdi = 0x339F6ED0
r8 = 4
r9 = 0x339F6ED0
r10 = 0x62A86B40
r11 = 0x4204FB60
r12 = 0
r13 = 0
r14 = 2
r15 = 4
rip = 0x000007FEF3770244
xmm0 = [ [-2.0458958e-43, 9.6272987e-35, -3.9290089e+24, 0], [0x80000092, 0x06FFF000, 0xE8500000, 0x00000000] ]
xmm1 = [ [0, 0, 0, 0], [0, 0, 0, 0] ]
xmm2 = [ [0, -nan, -nan, -nan], [0, -1, -1, -1] ]
xmm3 = [ [120.36296, 0, 0, 0], [0x42F0B9D6, 0x00000000, 0x00000000, 0x00000000] ]
xmm4 = [ [270, 0, 0, 0], [0x43870000, 0x00000000, 0x00000000, 0x00000000] ]
xmm5 = [ [67.26667, 0, 0, 0], [0x42868889, 0x00000000, 0x00000000, 0x00000000] ]
xmm6 = [ [0, 0, 0, 0], [0, 0, 0, 0] ]
xmm7 = [ [0, 0, 0, 0], [0, 0, 0, 0] ]
xmm8 = [ [0, 0, 0, 0], [0, 0, 0, 0] ]
xmm9 = [ [0, 0, 0, 0], [0, 0, 0, 0] ]
xmm10 = [ [0, 0, 0, 0], [0, 0, 0, 0] ]
xmm11 = [ [0, 0, 0, 0], [0, 0, 0, 0] ]
xmm12 = [ [0, 0, 0, 0], [0, 0, 0, 0] ]
xmm13 = [ [0, 0, 0, 0], [0, 0, 0, 0] ]
xmm14 = [ [0, 0, 0, 0], [0, 0, 0, 0] ]
xmm15 = [ [0, 0, 0, 0], [0, 0, 0, 0] ]
}
build_id: 1560910906 - 7 years ago
Crash log.
About Apex Legends Technical Issues
Community Highlights
- EA_Blueberry7 years ago
Community Manager
Recent Discussions
- 2 hours ago
- 3 hours ago
- 4 hours ago
- 5 hours ago