I have an application that scales well up to around five threads a core, due to the mix of IO and CPU that it does.
That is, you give it more threads, and the throughput increases; the overall time goes down.
The following graph shows, in blue, the Sun's java.util.zip.ZipFile time to complete a set of unzips on an increasing number of threads:
Wait, what the cocking shit.
The red line shows a pure Java implementation of ZipFile that scales expectedly well. It's slower on one thread, unfortunately, but faster than the synchronised Sun ZipFile with two threads.
As expected, the performance of Sun's implementation massively increases (for large number of threads) if you manually synchronise on a global monitor around the ZipFile uses. Red line unchanged:
Y-axis: Seconds. X-axis: 2^(n-1) threads (i.e. levelling out at 3 due to it being tested on a quad-core machine and being entirely CPU bound).
Test: Opening rt.jar and summing all the bytes in files who's path contains an 'e'. File entirely cached by the file-system. JIT warmed. Heap significantly larger than required.
Vista x64, Sun Java 1.6u20 x64 (i.e. server vm).