UnixBenchmark 5.1.2 a study measuring performance across different spec VPS’s

Linode

Linode

The idea was pretty simple. See how my VPS benchmarked using the unixbench script.

Then I took the idea further by upgrading my VPS to different sizes to see how performance tracked against different classes of VPS.

Linode, has a simple way to reconfigure VPS instances with more memory and space.

Upgrading services is completely symmetric that means bandwidth, memory, disk size and price all scale linearly.

In the symmetric Linode world, which relies on the XEN virtualisation platform, only a certain number of ‘nodes’ can reside on each box, say 40 nodes for a 512mb VPS offering, and therefore 20 nodes on a 1024mb VPS. The larger the VPS, the less users, therefore, potentially more CPU time and better disc IO.

Unixbench should help us understand what the potential benefits are in upgrading our VPS service in terms of disc IO and CPU. Lets take a look at the results.

First off we started with a clean 512mb VPS with Centos 5.5 (32bit), yum updated and gcc/make installed so that we could run unixbench. It is important to note that Linode CPU was identical across all 4 machines (L5630 2.13GHz), with 4 virtual cores enabled.

The latest unixbench was then downloaded from the google code repository and set off to work on our 512mb instance, here are the results:

512mb Linode

4 CPUs in system; running 1 parallel copy of tests
Dhrystone 2 using register variables 9880922.0 lps (10.0 s, 7 samples)
Double-Precision Whetstone 1932.9 MWIPS (10.3 s, 7 samples)
Execl Throughput 1524.7 lps (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 323047.5 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 84232.5 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 864676.5 KBps (30.0 s, 2 samples)
Pipe Throughput 449392.8 lps (10.0 s, 7 samples)
Pipe-based Context Switching 22118.8 lps (10.0 s, 7 samples)
Process Creation 2462.9 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 3618.6 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 1145.2 lpm (60.0 s, 2 samples)
System Call Overhead 452957.5 lps (10.0 s, 7 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 9880922.0 846.7
Double-Precision Whetstone 55.0 1932.9 351.4
Execl Throughput 43.0 1524.7 354.6
File Copy 1024 bufsize 2000 maxblocks 3960.0 323047.5 815.8
File Copy 256 bufsize 500 maxblocks 1655.0 84232.5 509.0
File Copy 4096 bufsize 8000 maxblocks 5800.0 864676.5 1490.8
Pipe Throughput 12440.0 449392.8 361.2
Pipe-based Context Switching 4000.0 22118.8 55.3
Process Creation 126.0 2462.9 195.5
Shell Scripts (1 concurrent) 42.4 3618.6 853.4
Shell Scripts (8 concurrent) 6.0 1145.2 1908.7
System Call Overhead 15000.0 452957.5 302.0
========
System Benchmarks Index Score 473.0

------------------------------------------------------------------------
Benchmark Run: Fri Nov 19 2010 01:31:40 - 02:00:07
4 CPUs in system; running 4 parallel copies of tests

Dhrystone 2 using register variables 39295363.6 lps (10.0 s, 7 samples)
Double-Precision Whetstone 7670.9 MWIPS (10.3 s, 7 samples)
Execl Throughput 5676.3 lps (29.4 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 311812.9 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 82966.6 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 1048007.7 KBps (30.0 s, 2 samples)
Pipe Throughput 1795144.2 lps (10.0 s, 7 samples)
Pipe-based Context Switching 212468.4 lps (10.0 s, 7 samples)
Process Creation 8793.6 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 9378.8 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 1283.3 lpm (60.1 s, 2 samples)
System Call Overhead 1620850.7 lps (10.0 s, 7 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 39295363.6 3367.2
Double-Precision Whetstone 55.0 7670.9 1394.7
Execl Throughput 43.0 5676.3 1320.1
File Copy 1024 bufsize 2000 maxblocks 3960.0 311812.9 787.4
File Copy 256 bufsize 500 maxblocks 1655.0 82966.6 501.3
File Copy 4096 bufsize 8000 maxblocks 5800.0 1048007.7 1806.9
Pipe Throughput 12440.0 1795144.2 1443.0
Pipe-based Context Switching 4000.0 212468.4 531.2
Process Creation 126.0 8793.6 697.9
Shell Scripts (1 concurrent) 42.4 9378.8 2212.0
Shell Scripts (8 concurrent) 6.0 1283.3 2138.9
System Call Overhead 15000.0 1620850.7 1080.6
========
System Benchmarks Index Score 1230.9

I’ll summarise the important numbers


1 parallel copy of test
File Copy 1024 bufsize 2000 maxblocks 311812.9 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 82966.6 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 1048007.7 KBps (30.0 s, 2 samples)
Pipe Throughput 1795144.2 lps (10.0 s, 7 samples)

1 parallel copy of test Score 473.0
4 parallel copies of test Score 1230.9

473 is a pretty awesome score. The big points here to note is the wicked disk IO, 311Mb/s for 1024 buff, and 1048Mb/s for 4096!!!! That is some pretty amazing performance, those number would indicate that Linode are packing SSD’s to cope with the load generated by 40 odd users.

Lets have a look at the numbers from a 1gb Linode;


4 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables 9625775.6 lps (10.0 s, 7 samples)
Double-Precision Whetstone 1912.9 MWIPS (10.2 s, 7 samples)
Execl Throughput 1246.5 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 76893.8 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 19415.0 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 276594.9 KBps (30.0 s, 2 samples)
Pipe Throughput 86488.9 lps (10.0 s, 7 samples)
Pipe-based Context Switching 16362.5 lps (10.0 s, 7 samples)
Process Creation 2301.2 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 3109.2 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 953.0 lpm (60.0 s, 2 samples)
System Call Overhead 446224.4 lps (10.1 s, 7 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 9625775.6 824.8
Double-Precision Whetstone 55.0 1912.9 347.8
Execl Throughput 43.0 1246.5 289.9
File Copy 1024 bufsize 2000 maxblocks 3960.0 76893.8 194.2
File Copy 256 bufsize 500 maxblocks 1655.0 19415.0 117.3
File Copy 4096 bufsize 8000 maxblocks 5800.0 276594.9 476.9
Pipe Throughput 12440.0 86488.9 69.5
Pipe-based Context Switching 4000.0 16362.5 40.9
Process Creation 126.0 2301.2 182.6
Shell Scripts (1 concurrent) 42.4 3109.2 733.3
Shell Scripts (8 concurrent) 6.0 953.0 1588.3
System Call Overhead 15000.0 446224.4 297.5
========
System Benchmarks Index Score 271.8

------------------------------------------------------------------------
Benchmark Run: Thu Nov 18 2010 20:41:48 - 21:09:45
4 CPUs in system; running 4 parallel copies of tests

Dhrystone 2 using register variables 38290705.2 lps (10.0 s, 7 samples)
Double-Precision Whetstone 7572.1 MWIPS (10.3 s, 7 samples)
Execl Throughput 4521.7 lps (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 103936.6 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 26407.4 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 395275.9 KBps (30.0 s, 2 samples)
Pipe Throughput 158949.9 lps (10.0 s, 7 samples)
Pipe-based Context Switching 76362.5 lps (10.0 s, 7 samples)
Process Creation 7095.2 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 7789.2 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 1065.2 lpm (60.1 s, 2 samples)
System Call Overhead 1605531.3 lps (10.0 s, 7 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 38290705.2 3281.1
Double-Precision Whetstone 55.0 7572.1 1376.7
Execl Throughput 43.0 4521.7 1051.6
File Copy 1024 bufsize 2000 maxblocks 3960.0 103936.6 262.5
File Copy 256 bufsize 500 maxblocks 1655.0 26407.4 159.6
File Copy 4096 bufsize 8000 maxblocks 5800.0 395275.9 681.5
Pipe Throughput 12440.0 158949.9 127.8
Pipe-based Context Switching 4000.0 76362.5 190.9
Process Creation 126.0 7095.2 563.1
Shell Scripts (1 concurrent) 42.4 7789.2 1837.1
Shell Scripts (8 concurrent) 6.0 1065.2 1775.4
System Call Overhead 15000.0 1605531.3 1070.4
========
System Benchmarks Index Score 657.3

Summary

1 parallel copy of test
File Copy 1024 bufsize 2000 maxblocks 76893.8 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 19415.0 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 276594.9 KBps (30.0 s, 2 samples)
Pipe Throughput 86488.9 lps (10.0 s, 7 samples)

1 parallel copy of test Score 271.0
4 parallel copies of test Score 657.9

WOW! Disc speed on the 1gb Linode is not as impressive as the 512mb system.

The tests from the 2 and 4 gb systems were almost mirror images of the 1gb system, i’ll summarise them:

2gb

File Copy 1024 bufsize 2000 maxblocks 77849.3 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 19654.5 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 279506.5 KBps (30.0 s, 2 samples)
Pipe Throughput 87483.1 lps (10.0 s, 7 samples)

1 parallel copy of test Score 273.0
4 parallel copies of test Score 667.9

4gb

File Copy 1024 bufsize 2000 maxblocks 68375.9 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 17805.5 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 252472.2 KBps (30.0 s, 2 samples)
Pipe Throughput 79473.8 lps (10.0 s, 7 samples)

1 parallel copy of test Score 241.0
4 parallel copies of test Score 569.

Conclusion:
Linode is not slacking off when it comes to providing a top level service. My current ‘node only benches 273, and I never notice a problem with Disc I/O. For those lucky users that get a Linode that is probably Raid-1/0 SSD based they certainly won’t be complaining about other users chewing up all the Disc I/O.

It would be interesting to know how many active users were on the box when I was running that test, it would be interesting to see how those numbers changed in a full house. I get the impression that performance would be as good, if not better than the other Linode services.

Hats off to Linode!

Comments are closed.