The Raw Benchmark Numbers

Section By Andrei Frumusanu

Before we go into more details, we're going to have a look at how much of a difference this behavior contributes to benchmarking scores. The key is in the differences between having Huawei/Honor's benchmark detection mode on and off. We are using our mobile GPU test suite which includes of Futuremark’s 3DMark and Kishonti’s GFXBench.

The analysis right now is being limited to the P20’s and the new Honor Play, as I don’t have yet newer stock firmwares on my Mate 10s. It is likely that the Mate 10 will exhibit similar behaviour - Ian also confirmed that he's seeing cheating behaviour on his Honor 10. This points to most (if not all) Kirin 970 devices released this year as being affected.

Without further ado, here’s some of the differences identified between running the same benchmarks while being detected by the firmware (cheating) and the default performance that applies to any non-whitelisted application (True Performance). The non-whitelisted application is a version provided to us from the benchmark manufacturer which is undetectable, and not publicly available (otherwise it would be easy to spot). 

3DMark Sling Shot 3.1 Extreme Unlimited - Graphics - Peak 

3DMark Sling Shot 3.1 Extreme Unlimited - Physics - Peak 

GFXBench Aztec High Off-screen VK - Peak 

GFXBench Aztec Normal Off-screen VK - Peak 

GFXBench Manhattan 3.1 Off-screen - Peak 

GFXBench T-Rex Off-screen - Peak

We see a stark difference between the resulting scores – with our internal versions of the benchmark performing significantly worse than the publicly available versions. We can see that all three smartphones perform almost identical in the higher power mode, as they all share the same SoC. This contrasts significantly with the real performance of the phones, which is anything but identical as the three phones have diferent thermal limits as a result of their different chassis/cooling designs. Consequently, the P20 Pro, being the largest and most expensive, has better thermals in the 'regular' benchmarking mode.

Raising Power and Thermal Limits

What is happening here with Huawei is a bit unusual in regards to how we’re used to seeing vendors cheat in benchmarks. In the past we’ve seen vendors actually raise the SoC frequencies, or locking them to their maximum states, raising performance beyond what’s usually available to generic applications.

What Huawei instead is doing is boosting benchmark scores by coming at it from the other direction – the benchmarking applications are the only use-cases where the SoC actually performs to its advertised speeds. Meanwhile every other real-world application is throttled to a significant degree below that state due to the thermal limitations of the hardware. What we end up seeing with unthrottled performance is perhaps the 'true' form of an unconstrained SoC, although this is completely academic when compared to what users actually expereience.

To demonstrate the power behaviour between the two different throttling modes, I measured the power on the newest Honor Play. Here I’m showcasing total device power at fixed screen brightness; for GFXBench the 3D phase of the benchmark is measured for power, while for 3DMark I’m including the totality of the benchmark run from start to finish (because it has different phases).

Honor Play Device Power - Default vs Cheating

The differences here are astounding, as we see that in the 'true performance' state, the chip is already reaching 3.5-4.4W. These are the kind of power figures you would want a smartphone to limit itself to in 3D workloads. By contrast, using the 'cheating' variants of the benchmarks completely explodes the power budget. We see power figures above 6W, and T-Rex reaching an insane 8.5W. On a 3D battery test, these figures very quickly trigger an 'overheating' notification on the device, showing that the thermal limits must be beyond what the software is expecting.

This means that the 'true performance' figures aren’t actually stable - they strongly depend on the device’s temperature (this being typical for most phones). Huawei/Honor are not actually blocking the GPU from reaching its peak frequency state: instead, the default behavior is a very harsh thermal throttling mechanism in place that will try to maintain significantly lower SoC temperature levels and overall power consumption.

The net result is that that in the phones' normal mode, peak power consumption during these tests can reach the same figures posted by the unthrottled variants. But the numbers very quickly fall back in a drastic manner. Here the device thottles down to 2.2W in some cases, reducing performance quite a lot.

Benchmarking Bananas: A Recap Getting the Real Data: Kirin 970 GPU Performance Overview
POST A COMMENT

84 Comments

View All Comments

  • beginner99 - Wednesday, September 5, 2018 - link

    The most interesting aspect is that it shows that ARM also struggles with power once they get into x86 performance area. No free lunch. And I wonder how the other devices cheat. Probably most due somehow. Huawei just wan't that clever. Reply
  • ncsaephanh - Wednesday, September 5, 2018 - link

    Great work on this piece. I really appreciate good journalism giving light to industry issues while having the technical expertise to dive deep and explain everything in a concise manner. And I wouldn't worry about catching this earlier, what's important is we know now. And hopefully at least some consumers now won't fall for the marketing/benchmarking hype. Reply
  • yhselp - Wednesday, September 5, 2018 - link

    The GFXBench T-Rex Offscreen Power Efficiency benchmark in the Kirin 970 piece still shows the cheating result for the Mate 10.

    It's astonishing to see the difference in sustained performance cooling alone can attribute for - P20 Pro and Honor Play have the same maker, same SoC, similar dimentions, and yet, the performance is quite different.
    Reply
  • Hyper72 - Wednesday, September 5, 2018 - link

    I thought that ever since Samsung was caught doing the same thing in 2013 you put in active countermeasures (randomly named benchmark software, etc.) or at least a test for cheating as a standard part of your setup? Reply
  • tommo1982 - Wednesday, September 5, 2018 - link

    These tests show similar behavior with iPhone. It's not any faster than the other leading brands. The difference between peak and sustained is huge. Same goes for Samsung and Xiaomi.

    I understand why the UI seems so fast and responsive, and why many people complained about the performance. It just can't stay at peak forever.
    Reply
  • eastcoast_pete - Thursday, September 6, 2018 - link

    To clarify up front: I don't own or like iOS devices. However, I have to give Apple its due here: the idea of really high, short burst performance coupled with okay longer-term speed is pretty much what I (and probably many other mobile users) want in smartphones. This is useful for multitasking while opening multiple browser windows etc., i.e. scenarios that really benefit from well above-normal CPU/GPU speeds for the few seconds, resulting in a fluid user experience. This is different from running the SoC to heat exhaustion and shutdown whenever a benchmarking app is recognized. Some current Android flagships are sort-of able to do that short burst ("turbo" in PCs) also, but none has yet the (momentary) peak performance of Apple's wide and deep cores. The Mongoose M3 was an attempt, the Kirin 980 was an apparent step towards this, sort of, but is now marred by this benchmark cheating BS. Let's see what QC can cook up, they tend to get closest to Apple's top SoC. Reply
  • techconc - Monday, September 10, 2018 - link

    Thermal throttling happens on ALL phones. That's not what's in question. The issue is with companies that artificially white list specific benchmarks in order to achieve results that would not be seen in real applications.

    To that end, Anandtech's battery tests have always demonstrated the difference between peak and sustained performance in mobile devices. Up through the iPhone 6s, there was very little throttling going on with iPhones on peak loads. To your point, the level of throttling in iPhones has been approaching practices of common Android equivalents.
    Reply
  • psychobriggsy - Thursday, September 6, 2018 - link

    Naughty. Makes running a benchmark in a 'loop mode' until the battery runs out very important IMO. If the device dies in an hour in benchmarks, but 3 hours elsewhere, then you know something's awry.

    However there is a potential positive - it shows that the Kirin 970 can perform well at higher power consumption - there's no performance wall between 3.5W and 9W, and the perf/W scales fairly well too.

    So - why not look into a 'docked' mode option in the future? One option could be a Switch-like dock, using external power (to protect the battery), optional cooling assistance, HDMI out to a TV, provide a controller in this pack as well, and allow the SoC to run as fast as this setup can keep the device from damaging itself. That's flippin' marketable. The dock would cost a few dollars, and it sounds like the software is already there in the main.

    Hopefully the Mali G76 in the Kirin 980 actually fixes a lot of the performance issues with Mali, which surely were a factor in this sad situation (also clearly saving money by using a smaller GPU, wide and slow beats narrow and fast for GPUs where power consumption matters.
    Reply
  • hanselltc - Friday, September 7, 2018 - link

    wut if: the white list includes popular games as well? is that still cheating? Reply
  • s.yu - Monday, September 10, 2018 - link

    Obviously you haven't read the article, the so-called whitelisting's performance can't be sustained, it's not as simple as merely activating some sort of game mode automatically. Reply

Log in

Don't have an account? Sign up now