Qualcomm's Quad-Core Snapdragon S4 (APQ8064/Adreno 320) Performance Previewby Anand Lal Shimpi & Brian Klug on July 24, 2012 7:10 PM EST
If you've been following our SoC related coverage, you'll probably have come across our coverage of Qualcomm's upcoming SoCs in their Mobile Development Platforms (MDPs). It's an interesting way to get both a feeling for the performance of a given platform before things are final, and to see how much OEMs affect the final performance.
Qualcomm flew us out to San Francisco to take a look at its newest part, APQ8064, which is quad core Krait v2 at up to 1.5 GHz with Qualcomm's new Adreno 320 GPU, and no baseband. This is a SoC destined primarily for tablets, although the combination of APQ8064 and MDM9615 will likely also be a common upcoming platform for the highest end phones.
At present, this is the same Krait CPU as what we've seen in MSM8960 in phones like the USA versions of the Galaxy S 3 and HTC One X. Later on, Krait v3 will emerge with higher IPC and shorter critical paths (and clocks up to 1.7 or 2 GHz) and a resulting 10-15% boost in performance. For now however we're looking at 1.5 GHz APQ8064 with a Krait v2 inside and Qualcomm's newest scalar GPU architecture with Adreno 320. We're going to talk more about Adreno 320 closer to devices shipping, when Qualcomm feels comfortable talking architecture.
Probably the single biggest notable change is the option in Adreno 320 to change from a TBR (Tile Based Renderer) to an immediate mode renderer on the fly. By default, the render mode is still TBR, however an API is exposed to allow applications to request immediate mode. In the future, some heuristics will be used to determine which mode is faster, including rendering some frames in immediate mode, some frames in TBR mode. Initial shipping devices with Adreno 320 will however just expose an API until the switching system is finalized. Update: Adreno is still a TBR not TBDR as stated earlier.
In terms of features, Adreno 320 adds OpenGL ES 3.0 (codename Halti) support, and GPGPU capabilities with OpenCL 1.2 and RenderScript. In terms of Windows APIs, Adreno 320 is Direct3D 11 feature level 9_3.
After a morning of sessions about benchmarks and how they reflect different areas of performance (which is another big discussion), we were given hands on time with the mobile development platform for APQ8064, the MDP/T APQ8064. MDP for Mobile Development Platform, T for Tablet. The MDP/T includes a 10.1" WXGA display (1366 x 720), 2 GB of LPDDR2 at 533 MHz (2x32 bits, PoP), 13 MP rear camera, 7 microphones, and all the usual ports and buttons. The tablet was running Android 4.0.4, and although the software is understandably not final, things were pretty stable. In addition, the MDP/T will be sold though Bsquare at some later date for $1299.
Before we get too far in our performance testing, a refresher of the usual caveats is a good idea. We were allowed unsupervised benchmarking time with the APQ8064 MDP/T, however this is still a reference platform. Final shipping devices may run at different speeds or deliver different performance based on their software configuration. While the MSM8960 MDP ended up performing very close to HTC's One X/S, anything can happen in the final implementation of an SoC.
We'll start our performance analysis with GLBenchmark, more specifically, some of the raw feature tests to see just how things have improved over the MSM8960:
Raw fill rate almost tripled over the Adreno 225 in the HTC One X, and there's a healthy advantage over NVIDIA's Tegra 3 as well. Imagination's PowerVR SGX 543MP2 still manages a higher fill rate, and the MP4 in the new iPad can't be touched either.
Raw polygon throughput is higher than everything aside from the 543MP4, an impressive step forward from the Adreno 225 but still not enough to outpace the high end ImgTec solution.
Here we see nearly 2x the triangle throughput of the Adreno 225, and better performance than the 543MP2. The MP4 continues to be a monster though.
These next two tests are rather meaningless as they're bound by vsync. Hopefully we'll see a newer version of GLBenchmark soon enough that will stress these devices more at native resolutions:
GLBenchmark gets around the default vsync requirement by rendering to an offscreen buffer at 720p, giving us a true apples-to-apples comparison of game-like performance among all of these SoCs. The quad-core S4 Pro with Adreno 320 does incredibly well:
In the older Pro test frame rates are insanely high for most of the devices, indicating the age of the benchmark, but the Adreno 320's standings are very good - second to only the PowerVR SGX 543MP4. Compared to the Adreno 225, the 320 is almost twice as fast.
Egypt, the newer of the two "game" tests in GLBenchmark is a bit more stressful. Here the Adreno 320 gets extremely close to the SGX 543MP4 in the new iPad. Apple maintains a 6.8% performance advantage at 720p in this, a largely compute bound benchmark. Performance here is more than double that of the Adreno 225, and 72% faster than NVIDIA's fastest Tegra 3.
Overall Adreno 320 looks to be a good step forward in performance, although still a bit slower than the latest and greatest from Imagination Technologies. Compared to what everyone else is shipping in Android based tablets/smartphones however, Adreno 320 is easily the new king of the hill.
Qualcomm integrated four Krait v2 cores in the APQ8064 running at 1.5GHz, so CPU performance should range from very similar to significantly better than the dual-core Snapdragon S4 depending on the workload. Just as we've seen with Tegra 3, heavily threaded workloads will scale quite nicely while lightly threaded workloads will look mostly the same:
Sunspider performance is excellent on the MDP/T, actually delivering a better score than the Medfield based Lava Xolo X900 (1279.4ms). It's unclear how much of this performance increase over the dual-core S4 is due to the added cores vs. software optimizations to the MDP/T's browser.
BrowserMark tells a much more conservative story, however the S4 Pro is still able to outpace the dual-core S4 based One X by 22%. Again we're doing a bit of apples-to-oranges here since the browser and remaining software stack between devices isn't perfectly identical.
BaseMark OS includes a heavily threaded benchmark that can hit all four cores in the MDP/T as well as in the Tegra 3 based devices. The overall score incoporates the SMP test but doesn't weight it too heavily. The end result is still good for the MDP/T; it's the fastest Android device we've tested here.
We ran the multithreaded Linpack Android test to confirm quad-core scaling and indeed we saw just that. While the HTC One X is good for a score of around 210 MFLOPS, the MDP/T with twice the cores hit 413 MFLOPS. We were able to get numbers as high as 514 MFLOPS, which is more a demonstration of the volatility of the test than anything else.
Overall the quad-core S4 Pro should deliver everything we love about the dual-core S4's performance, just with more cores. As individual cores can be power gated, there shouldn't be much of a power penalty unless you actually need the extra power. The extra cores should come in handy with heavy multitasking (something we may see even more of on Windows RT tablet/notebook hybrids) or with the rare heavily threaded application.