20
06/10
18:20
iPhone 3G, 3GS and 4 linpack benchmark results
Inspired by theĀ Android linpack top 10 chart I set out to compile the linpack benchmark for the iPhone and try it on the 3G and 3GS. What I found surprised me.
I used a C implementation of the linpack benchmark fromĀ www.netlib.org to create the app.
To be clear, I don’t really know what the Android Java linpack benchmark is doing and how it relates to the C implementation I used so I have no idea how the numbers I got compare to the Android numbers. It also appears some of the reported Android numbers were manipulated by some users as a prank so it’s difficult to tell what is real and what is not.
That being said, what’s interesting is not the performance difference between iPhone and Android, especially since the Android phones are newer than these iPhones models and Apple will announce a new device tomorrow which should be viewed as a comparable competitor to the latest Android phones. With any luck, I’ll benchmark the new iPhone in the near future. What I find interesting is the performance differences between the 3G and the 3GS.
I ran the benchmark on a 1000×1000 matrix (see the linpack benchmark documentation to read about this) both in single precision and double precision modes with what I believe are the fastest code generation settings for the two devices.
I also have iPhone OS 4 Beta 4 installed at the moment which may not be ideal for performance testing as Apple beta’s can sometimes slow things down. Also, the multitasking in iPhone OS 4 on the 3GS may have an effect as we’ll see below.
| Double precision | Single precision | |
| iPhone 3G | 20.6 MFlops | 28.5 MFlops |
| iPhone 3GS | 18.2 MFlops | 27.3 MFlops |
And there you have it, the 3G and 3GS have essentially equivalent floating point performance. The 3GS actually shows a 10-12% disadvantage but this could be a result of the beta OS or the increased multitasking that occurs on a 3GS. I would guess they are essentially equivalent.
Based on these numbers it appears that the 3GS’ speed advantage over the 3G is strictly in the integer realm. Floating point performance is the same.
Update:
I’ve run the benchmark on an iPhone 4. Here are the results:
| Double precision | Single precision | |
| iPhone 4 | 20.9 MFlops | 36.2 MFlops |
iPhone 4 has somewhat better single precision floating point performance while double precision is the same. This benchmark does not explicitly utilize the ARM NEON SIMD instructions available through the Accelerate framework. Apple claims (in their WWDC presentations) that NEON can provide an up to 8x performance gain for single precision floating point operations. So that 36.2 MFlops figure could be significantly improved by modifying the code to use iOS4′s BLAS libraries (assuming the very optimistic full 8x speedup that’s ~290MFlops!). NEON does not support double precision operations so those are relegated to the ARM VFP which has the same performance as the 3G/3GS devices.
nopantsu
September 8, 2010
1:32 am
Your results clearly aren’t accurate. Ipad > 4 > 3Gs > 3G. Don’t think anybody will believe these numbers until they start falling in line with other benchmarks that compare all iphone models. Trying a 500×500 array like they did at green computing may help.
admin
September 8, 2010
3:19 am
I did not measure or bring up iPad performance so I don’t know why you are mentioning it. These are the numbers I got for CPU floating point performance with Linpack on iPhone 4, 3Gs and 3G and they make sense to me. As is the nature of benchmarks, different benchmarks may paint different pictures of the same test devices. Feel free to share links to the results of the benchmarks you mention.
nopantsu
September 17, 2010
9:11 pm
Why your benchmark is flawed:
1) The 3GS scores the lowest, and the 3G(which is a 2 year old processor) scores as high as the iPhone 4(one of the fastest processors around).
2) The iPhone 3 and iPhone 4 benchmark on double precision are within a few percentage points of each other even though the iPhone 4 should score a lot better. If a serious tester saw such close results on disproportionate hardware, they would assume some sort of hardware bottleneck was hit therefore the results couldn’t used for comparative purposes.
2) This benchmark would also lead us to believe the 2 year old 400mhz processor on the iPhone 3 is faster than a Nexus One or Evo with froyo. Those phones get around 16 Mflops when they are clock at 1 ghz. A bad benchmark is the simplest explanation for the iPhone 3 being able to beat these new high end phones that are clocked at over twice the speed.
admin
September 17, 2010
10:52 pm
You state a lot about how you think the benchmark results should look but you don’t provide anything to backup your assumptions. You are trying to fit results to your world view instead of the other way around. I stated earlier I’m not comparing these numbers to the Android numbers because I don’t know how that benchmark relates to this one. For example, one is written in Java while the other is in C.
Here are similar benchmark results obtained by a TI employee for a TI ARM CPU which I think is similar to the iPhone CPUs. Note that these benchmark numbers were obtained by a TI employee.
http://e2e.ti.com/support/dsp/omap_applications_processors/f/447/p/41101/143482.aspx#143482
nopantsu
September 21, 2010
6:28 am
This is from the release note from the java version of linpack: “As of 30 June 2000 the problem size has been increased to 500×500. This was done because the timing resolution was too low to get accurate Mflop ratings for the 100×100 problem on very fast machines.”
http://www.netlib.org/benchmark/linpackjava/
That TI employee messed up his app too. Look at the times, they are calculating in half a second for each run. He’s also using a 200×200 array. Linpack test need to run for a few seconds to get accurate results.
How long did it take for your individual runs to complete? I’d bet they finished in under a second just like the TI employee’s runs. Trying upping the matrix size until it takes 5+ seconds for the test to complete. Those would be results I would trust.
admin
September 21, 2010
7:03 am
I’m not sure what you are trying to show me with the Java release note as it provides no insight on how to compare a Java benchmark vs. a C benchmark. It explicitly states “This test is more a reflection of the state of the Java systems than of the floating point performance of the underlying processors”.
I left the original linpack code untouched and so it does not complete until it goes through a 10 second run. I’ve tried it with various matrix sizes from 100×100 to 1000×1000 with no difference in the results.
NoPantsu
September 21, 2010
4:53 pm
I’m trying to show you that linpack scores, regardless whether it’s programmed in C or Java, can be highly dependent on the matrix size.
At first, I thought your program was flawed because the matrix size was too small so it wasn’t getting an accurate result. But if your test is running at a minimum of several seconds and getting the same results with different matrix sizes then I don’t what is causing these results. But the iphone 3′s double precision results are not believable when compared to the iphone 4 or 3gs. Or maybe it’s the other way around and the iphone 4 and 3gs are off somehow. But something isn’t adding up.
Tom
October 8, 2010
6:18 am
Here’s why the results are flawed if nothing else, the iPhone 3G has a 400MHz CPU, the 3GS has a 600MHz CPU, and the 4 has a 750MHz CPU.
So for the chips to all get roughly the same benchmark scores, the implication is that each successive generation of chips has dramatically worse floating point logic. Doesn’t seem likely at all.
admin
October 8, 2010
7:15 am
Or the VFP is identical across all chips and is independently clocked from the rest of the chip for any of a variety of reasons (VFP design limits, deliberate under clocking to reduce power consumption, etc.).
Additionally, this test is not testing a CPU in isolation, it is testing an entire system and there could be other limiting factors outside of the CPU. This doesn’t make the test any less relevant.
paul
November 25, 2010
12:47 pm
You obviously hate apple. /s
Re: Tom – have you looked at the android top 10 linked on this page.
Currently number 1 position is 806.4MHz
number 3 is: 1420.8MHz
Clock speed is not everything!
Craig
February 10, 2011
5:19 am
Putting that in historical perspective, iPhone 4 has similar performance (about 21 MFLOPS) as the 1994 Intel Pentium P54C 166 MHz, if I’m reading http://home.iae.nl/users/mhx/flops.html properly?
admin
February 10, 2011
7:35 am
Here is another page for the CPU which seems to more or less agree. So I guess that’s correct for simple floating point.
Jake Breyck
February 28, 2012
7:35 am
It looks like the Cray X-MP I believe the most powerful computer in the world in 1984 was also 21 Mflops the cray 1 only did 12 Mflops.
nop
April 19, 2012
6:38 pm
Both IPhone 3GS and iPhone 4 have a deprecated non-pipelined version of ARM VFP coprocessor which is very slow. If you compile your code for ARMv6 instruction set with VFP support only, tightly optimized floating point code can perform several times slower on 3GS, than on 3G. Fact that your got performance scores a so low and so similar probably means that code is not well optimized at all and is hitting a different bottleneck, maybe memory/cache subsystem, maybe something else. I don’t use LINPACK myself, but I heard somewhere that its most important optimizations in the past involved tuning of memory access patterns for every relevant architecture.