Wednesday, January 3, 2007

Loongson 2E: Facts on its performance


Actual performance of the China-made CPU, Loongson, is still in question. Many(maybe a majority of) Chinese themselves don't think Loongson can catch up or just somehow competitive with Intel or AMD one day in the future. Rationality is not flooded by patriotism in this case. "Too slow! Pentium III 667 came out many years ago!" A customer said when a reporter ask him if he would choose Loongson.

It is true that Loongson is currently not comparable to any mainstream Intel or AMD products. Their x86 CISC CPUs has a long and glorious history, a wide software support and, most important, innumerable users. It is always difficult for any user to change from a popular platform to a uncommon one. What is worse, Windows will never be running on the CPU! The new-born baby Loongson has so long a way to go before it becomes a giant.

Software is another issue. I'll only concentrate on hareware performance. Loongson 2E is designed for 1GHz frequency but failed to run stably under that condition for some technical problem. Normally, L2E can run at 600-800MHz with ease and is released as 660MHz in Lemote Box.

Loongson official declared that L2E@1GHz achieved a score of 500 under test of SPEC CPU2000. What does this mean?
###################################
CINT2000:
Advanced Micro Devices ASUS A7V Motherboard 1.2GHz Athlon processor 1 core, 1 chip, 1 core/chip 409 458
Advanced Micro Devices ASUS A7V Motherboard, 1.3GHz Athlon Processor 1 core, 1 chip, 1 core/chip 438 491
Advanced Micro Devices Asus A7M266-D Motherboard, AMD Athlon (TM) MP 2000+ 1 core, 1 chip, 1 core/chip 638 662
Advanced Micro Devices Asus A7M266-D Motherboard, AMD Athlon (TM) MP 2400+ 1 core, 1 chip, 1 core/chip 737 766
Advanced Micro Devices Epox 8KHA+ Motherboard, AMD Athlon (TM) XP 1700+ 1 core, 1 chip, 1 core/chip 633 656
Advanced Micro Devices Epox 8KHA+ Motherboard, AMD Athlon (TM) XP 1800+ 1 core, 1 chip, 1 core/chip 648 671
Advanced Micro Devices Gigabyte GA-7DX Motherboard 1.2GHz Athlon processor 1 core, 1 chip, 1 core/chip 443 496
Advanced Micro Devices Gigabyte GA-7DX Motherboard, 1.33GHz Athlon Processor 1 core, 1 chip, 1 core/chip 482 539
Advanced Micro Devices Gigabyte GA-7DX Motherboard, 1.4GHz Athlon Processor 1 core, 1 chip, 1 core/chip 495 554
Advanced Micro Devices Gigabyte GA-7DX Motherboard, AMD Athlon (TM) XP 1500+ 1 core, 1 chip, 1 core/chip 556 577
Advanced Micro Devices Gigabyte GA-7DX Motherboard, AMD Athlon (TM) XP 1600+ 1 core, 1 chip, 1 core/chip 572 595
Advanced Micro Devices Tyan Thunder K7 Motherboard, 1.2GHz Athlon MP Processor 1 core, 1 chip, 1 core/chip 495 522
Advanced Micro Devices Tyan Thunder K7 Motherboard, AMD Athlon (TM) MP 1500+ 1 core, 1 chip, 1 core/chip 534 554
Advanced Micro Devices Tyan Thunder K7 Motherboard, AMD Athlon (TM) MP 1600+ 1 core, 1 chip, 1 core/chip 550 571
Advanced Micro Devices Tyan Thunder K7 Motherboard, AMD Athlon (TM) MP 1800+ 1 core, 1 chip, 1 core/chip 587 609
Intel Corporation Intel D850GB motherboard (1.3 GHz, Pentium 4 processor) 1 core, 1 chip, 1 core/chip 475 486
Intel Corporation Intel D850GB motherboard (1.4 GHz, Pentium 4 processor) 1 core, 1 chip, 1 core/chip 498 512
Intel Corporation Intel D850GB motherboard (1.5 GHz, Pentium 4 processor) 1 core, 1 chip, 1 core/chip 526 539
Intel Corporation Intel D815EEA2 motherboard (1.0 GHz, Pentium III processor) 1 core, 1 chip, 1 core/chip 402 408
Intel Corporation Intel D815EEA2 motherboard (1.0B GHz, Intel Pentium III proc 1 core, 1 chip, 1 core/chip 452 454
Intel Corporation Intel D815EEA2 motherboard (1.0B GHz, Pentium III processor) 1 core, 1 chip, 1 core/chip 451 457
Intel Corporation Intel D815EEA2 motherboard (1.1 GHz, Pentium III processor) 1 core, 1 chip, 1 core/chip 421 427
Intel Corporation Intel D850EMV2 motherboard (1.5 GHz, Pentium 4 processor) 1 core, 1 chip, 1 core/chip 560 562
Intel Corporation Intel D850EMV2 motherboard (1.6 GHz, Pentium 4 processor) 1 core, 1 chip, 1 core/chip 584 588

CFP2000:
Advanced Micro Devices ASUS A7V Motherboard, 1.2GHz Athlon Processor 1 core, 1 chip, 1 core/chip 328 352
Advanced Micro Devices ASUS A7V Motherboard, 1.3GHz Athlon Processor 1 core, 1 chip, 1 core/chip 348 374
Advanced Micro Devices Gigabyte GA-7DX Motherboard, 1.2GHz Athlon Processor 1 core, 1 chip, 1 core/chip 387 417
Advanced Micro Devices Gigabyte GA-7DX Motherboard, 1.33GHz Athlon Processor 1 core, 1 chip, 1 core/chip 414 445
Advanced Micro Devices Gigabyte GA-7DX Motherboard, 1.4GHz Athlon Processor 1 core, 1 chip, 1 core/chip 426 458
Advanced Micro Devices Gigabyte GA-7DX Motherboard, AMD Athlon (TM) XP 1500+ 1 core, 1 chip, 1 core/chip 494 536
Advanced Micro Devices Gigabyte GA-7DX Motherboard, AMD Athlon (TM) XP 1600+ 1 core, 1 chip, 1 core/chip 504 547
Advanced Micro Devices Gigabyte GA-7DX Motherboard, AMD Athlon (TM) XP 1700+ 1 core, 1 chip, 1 core/chip 535 580
Advanced Micro Devices Gigabyte GA-7DX Motherboard, AMD Athlon (TM) XP 1800+ 1 core, 1 chip, 1 core/chip 542 588
Advanced Micro Devices Epox 8KHA+ Motherboard, AMD Athlon (TM) XP 1700+ 1 core, 1 chip, 1 core/chip 561 604
Advanced Micro Devices Epox 8KHA+ Motherboard, AMD Athlon (TM) XP 1800+ 1 core, 1 chip, 1 core/chip 572 615
Intel Corporation Intel D815EEA2 motherboard (1.0 GHz, Pentium III processor) 1 core, 1 chip, 1 core/chip 254 264
Intel Corporation Intel D815EEA2 motherboard (1.0B GHz, Intel Pentium III proc 1 core, 1 chip, 1 core/chip 303 292
Intel Corporation Intel D815EEA2 motherboard (1.0B GHz, Pentium III processor) 1 core, 1 chip, 1 core/chip 297 310
Intel Corporation Intel D815EEA2 motherboard (1.1 GHz, Pentium III processor) 1 core, 1 chip, 1 core/chip 258 268
Intel Corporation Intel D850GB motherboard(1.3 GHz, Pentium 4 processor) 1 core, 1 chip, 1 core/chip 503 511
Intel Corporation Intel D850GB motherboard(1.4 GHz, Pentium 4 processor) 1 core, 1 chip, 1 core/chip 529 538
Intel Corporation Intel D850GB motherboard(1.5 GHz, Pentium 4 processor) 1 core, 1 chip, 1 core/chip 549 558
Intel Corporation Intel D850GB motherboard(1.6 GHz, Pentium 4 processor) 1 core, 1 chip, 1 core/chip 578 587
Intel Corporation Intel D850GB motherboard(1.7 GHz, Pentium 4 processor) 1 core, 1 chip, 1 core/chip 598 608

It seems that L2E@1G equals to P4@1.3G. But here are some problems:
1. If we produce Loongson 2E @1GHz, rate of spoiled products will be too high to accept.
2. L2E reaches 1GHz only when the core voltage is 1.4V, which is larger than normal 1.2V.
3. The compiler is not GCC or something also common, but a special one developed by China Academy of Sciences.
####################################

Some people also test L2E@660MHz using simple Unixbench.

************************************
BYTE UNIX Benchmarks (Version 4.1.0)
System -- Linux tony-debian 2.6.18.1ict #13 Mon Nov 20 21:58:01 CST 2006 mips GNU/Linux
Start Benchmark Run: Sat Dec 16 16:16:51 CST 2006
1 interactive users.
16:16:51 up 47 min, 1 user, load average: 0.14, 0.04, 0.01
lrwxrwxrwx 1 root root 4 Dec 15 21:07 /bin/sh -> bash
/bin/sh: symbolic link to `bash'
/dev/hda3 7700700 6551524 757996 90% /
Dhrystone 2 using register variables 1953973.7 lps (10.0 secs, 10 samples)Double-Precision Whetstone 399.8 MWIPS (10.0 secs, 10 samples)
System Call Overhead 279138.8 lps (10.0 secs, 10 samples)
Pipe Throughput 210600.9 lps (10.0 secs, 10 samples)
Pipe-based Context Switching 86021.1 lps (10.0 secs, 10 samples)
Process Creation 2235.3 lps (30.0 secs, 3 samples)
Execl Throughput 727.9 lps (29.7 secs, 3 samples)
File Read 1024 bufsize 2000 maxblocks 236110.0 KBps (30.0 secs, 3 samples)
File Write 1024 bufsize 2000 maxblocks 116200.0 KBps (30.0 secs, 3 samples)
File Copy 1024 bufsize 2000 maxblocks 67028.0 KBps (30.0 secs, 3 samples)
File Read 256 bufsize 500 maxblocks 98410.0 KBps (30.0 secs, 3 samples)
File Write 256 bufsize 500 maxblocks 44905.0 KBps (30.0 secs, 3 samples)
File Copy 256 bufsize 500 maxblocks 27958.0 KBps (30.0 secs, 3 samples)
File Write 256 bufsize 500 maxblocks 44905.0 KBps (30.0 secs, 3 samples)
File Copy 256 bufsize 500 maxblocks 27958.0 KBps (30.0 secs, 3 samples)
File Read 4096 bufsize 8000 maxblocks 390039.0 KBps (30.0 secs, 3 samples)
File Write 4096 bufsize 8000 maxblocks 181774.0 KBps (30.0 secs, 3 samples)
File Copy 4096 bufsize 8000 maxblocks 102845.0 KBps (30.0 secs, 3 samples)
Shell Scripts (1 concurrent) 826.6 lpm (60.0 secs, 3 samples)
Shell Scripts (8 concurrent) 116.0 lpm (60.0 secs, 3 samples)
Shell Scripts (16 concurrent) 58.7 lpm (60.0 secs, 3 samples)
Arithmetic Test (type = short) 439655.8 lps (10.0 secs, 3 samples)
Arithmetic Test (type = int) 456636.0 lps (10.0 secs, 3 samples)
Arithmetic Test (type = long) 456637.3 lps (10.0 secs, 3 samples)
Arithmetic Test (type = float) 178354.9 lps (10.0 secs, 3 samples)
Arithmetic Test (type = double) 150313.9 lps (10.0 secs, 3 samples)
Arithoh 66258450.9 lps (10.0 secs, 3 samples)C Compiler Throughput 206.2 lpm (60.0 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places 77466.7 lpm (30.0 secs, 3 samples)
Recursion Test--Tower of Hanoi 16793.9 lps (20.0 secs, 3 samples)
INDEX VALUES
TEST BASELINE RESULT INDEX

Dhrystone 2 using register variables 116700.0 1953973.7 167.4
Double-Precision Whetstone 55.0 399.8 72.7
Execl Throughput 43.0 727.9 169.3
File Copy 1024 bufsize 2000 maxblocks 3960.0 67028.0 169.3
File Copy 256 bufsize 500 maxblocks 1655.0 27958.0 168.9
File Copy 4096 bufsize 8000 maxblocks 5800.0 102845.0 177.3
Pipe Throughput 12440.0 210600.9 169.3
Pipe-based Context Switching 4000.0 86021.1 215.1
Process Creation 126.0 2235.3 177.4
Shell Scripts (8 concurrent) 6.0 116.0 193.3
System Call Overhead 15000.0 279138.8 186.1
=========
FINAL SCORE 164.8

*******************************

This is about what PentiumIII at same frequency can get.

Here are still problem:
Arithmetic Test (type = short) 439655.8 lps (10.0 secs, 3 samples)
Arithmetic Test (type = int) 456636.0 lps (10.0 secs, 3 samples)
Arithmetic Test (type = long) 456637.3 lps (10.0 secs, 3 samples)
Arithmetic Test (type = float) 178354.9 lps (10.0 secs, 3 samples)
Arithmetic Test (type = double) 150313.9 lps (10.0 secs, 3 samples)

FP results are unbelievably low. Loongson official explains that L2E has 4 FP apparatus which give it ability to do 4 double FP or 8 FP computing in one clock period. But current compiler (gcc) can only use two of them.

In addition, I heard that external hardwares such as north bridge, south bridge, main memory, etc. also restrict Loongson in terms of compatibility.

All in all, we can just expect adequate much from Loongson2E. L2E is far from brilliant, yet the whole bluescript of Loongson is great. They have big potential. They are being improved. Given better hardware and software supports, Loongson 2F and Loongson3 should satisfy us more.

1 comment:

Anonymous said...

Well You have got several things wrong:

1. gcc (I checked version 3.1.1 which was released in 2002) does actually use the fused multiply add instruction on MIPS by default. The longsoon has a MIPS instruction set so the claim that gcc does not use it, is not sustainable. Furthermore the fmad instruction is far from 'advanced' the idea is actually quite old.

2. The SPEC CPU Results typically consist of 2 (!) values, one for integer and one for floating point performance. Exactly 500 for both is unlikely (fp performance is commonly less than int performance) just because such a 'nice' (I mean it is not like 437 or 511) does not appear in the SPEC Benchmarks Database. And as long such values are not publicly repeatable I can claim anything. (Actually I have built my own CPU which achives 100000 SPEC int and SPEC fp points with a total powerconsumption of just 3 Watts. What? You say thats crap? Well so are your propaganda values pal!)

So if You do not have any idea about benchmarks and did not even do them Yourself, well then just do not post them as they are just plain rubbish!