Jump to content

Java performance

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 99.195.196.136 (talk) at 13:09, 2 August 2009. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Although there are abstract language constructs that are know to be slower such as garbage collection and indirection, objectively comparing the performances of a Java program and another equivalent one written in a natively compiled language such as C, C++ or Object Pascal is a tricky and controversial task. Java is a programming language like C++, but the target platform of its compiler is the Java platform, nowadays implemented in the JVM. C, C++, and Object Pascal programs are compiled specifically for a hardware-OS combination that makes a platform. Very different and hard-to-compare scenarios raise from these two different approaches: static vs. dynamic compilations and recompilations, the availability of precise information about the runtime environment and others.

The performance of the compiled Java program will depend on how smartly its particular tasks are going to be managed by the host JVM, and how well for doing it the JVM could take advantage of the features of the hardware and OS. Thus, any Java performance test or comparison has always to report the version, vendor, OS and hardware architecture of the used JVM. In a similar manner, the performance of the equivalent natively-compiled program will depend on the quality of its generated machine code, so the test or comparison also has to report the name, version and vendor of the used compiler, and its activated optimization directives.

Programs written in Java have had a reputation for being slower and requiring more memory than those written in natively compiled languages such as C or C++.[1] However, Java programs' execution speed has improved significantly due to the introduction of Just-In Time compilation (in 1997/1998 for Java 1.1),[2][3][4] the addition of language features supporting better code analysis, and optimizations in the Java Virtual Machine itself (such as HotSpot becoming the default for Sun's JVM in 2000).

Virtual machine optimization techniques

Many optimizations have improved the performance of the Java Virtual Machine over time. However, although Java was often the first Virtual machine to implement them successfully, they have often been used in other similar platforms as well.

Just-In-Time compilation

Early Java Virtual Machine always interpreted bytecodes. This had a huge performance penalty (between a factor 10 and 20 for Java versus C in average applications). [1]

Java 1.1 saw the introduction of a JIT compiler.

Java 1.2 saw the introduction of an optional system called HotSpot : The Virtual Machine continually analyzes the program's performance for "hot spots" which are frequently or repeatedly executed. These are then targeted for optimization, leading to high performance execution with a minimum of overhead for less performance-critical code.

With the introduction of Java 1.3 HotSpot was the default system.

With the HotSpot technique, code is first interpreted, then "hot spots" are compiled on the fly. This is the reason why it is necessary to execute the programs a few times before measuring performances in benchmarks.

The HotSpot-compilation uses many optimization techniques, such as Inline expansion, Loop unwinding, Bounds-checking elimination, or architecture dependent Register allocation.[5][6]

Some benchmarks show a 10-fold speed gain from this technique.[7]

Adaptive optimization

Adaptive optimization is a technique in computer science that performs dynamic recompilation of portions of a program based on the current execution profile. With a simple implementation, an adaptive optimizer may simply make a trade-off between Just-in-time compilation and interpreting instructions. At another level, adaptive optimization may take advantage of local data conditions to optimize away branches and to use inline expansion to decrease context switching.

A Virtual Machine like HotSpot is also able to deoptimize a previously JITed code. This allows it to perform aggressive (and potentially unsafe) optimizations, while still being able to deoptimize the code and fall back on a safe path later on.[8][9]

Garbage collection

The 1.0 and 1.1 Virtual Machines used a mark-sweep collector, which could fragment the heap after a garbage collection. Starting with Java 1.2, the Virtual Machines switched to a generational collector, which has a much better defragmentation behaviour.[10] Modern Virtual Machines use a variety of techniques that have further improved the garbage collection performance.[11]

Other optimization techniques

Split bytecode verification

Prior to executing a class, the Sun JVM verifies its bytecodes (see Bytecode verifier). This verification is performed lazily: classes bytecodes are only loaded and verified when the specific class is loaded and prepared for use, and not at the beginning of the program. (Note that other verifiers, such as the Java/400 verifier for IBM System i, can perform most verification in advance and cache verification information from one use of a class to the next.) However, as the Java Class libraries are also regular Java classes, they must also be loaded when they are used, which means that the start-up time of a Java program is often longer than for C++ programs, for example.

A technique named Split-time verification, first introduced in the J2ME of the Java platform, is used in the Java Virtual Machine since the Java version 6. It splits the verification of bytecode in two phases:[12]

  • Design-time - during the compilation of the class from source to bytecode
  • runtime - when loading the class.

In practice this technique works by capturing knowledge that the Java compiler has of class flow and annotating the compiled method bytecodes with a synopsis of the class flow information. This does not make runtime verification appreciably less complex, but does allow some shortcuts.

Escape analysis and lock coarsening

Java is able to manage multithreading at the language level. Multithreading is a technique that allows one to

  • improve a user's perceived impression about program speed, by allowing user actions while the program performs tasks, and
  • take advantage of multi-core architectures, enabling two unrelated tasks to be performed at the same time by two different cores.

However, programs that use multithreading need to take extra care of objects shared between threads, locking access to shared methods or blocks of code when they are used by one of the threads. Locking a block or an object is a time-consuming operation due to the nature of the underlying operating system-level operation involved (see concurrency control and lock granularity).

As the Java library does not know which methods will be used by more than one thread, the standard library always locks blocks of code when necessary in a multithreaded environment.

Prior to Java 6, the virtual machine always locked objects and blocks when asked to by the program (see Lock Implementation), even if there was no risk of an object being modified by two different threads at the same time. For example, in this case, a local Vector was locked before each of the add operations to ensure that it would not be modified by other threads (Vector is synchronized), but because it is strictly local to the method this is not necessary:

public String getNames() {
     Vector v = new Vector();
     v.add("Me");
     v.add("You");
     v.add("Her");
     return v.toString();
}

Starting with Java 6, code blocks and objects are locked only when necessary [2] [3], so in the above case, the virtual machine would not lock the Vector object at all.

As of version 6u14, Java includes experimental support for escape analysis. [4]

Register allocation improvements

Prior to Java 6, allocation of registers was very primitive in the "client" virtual machine (they did not live across blocks), which was a problem in architectures which did not have a lot of registers available, such as x86 for example. If there are no more registers available for an operation, the compiler must copy from register to memory (or memory to register), which takes time (registers are typically much faster to access). However the "server" virtual machine used a color-graph allocator and did not suffer from this problem.

An optimization of register allocation was introduced in Sun's JDK 6[13]; it was then possible to use the same registers across blocks (when applicable), reducing accesses to the memory. This led to a reported performance gain of approximately 60% in some benchmarks.[14]

Class data sharing

Class data sharing (called CDS by Sun) is a mechanism which reduces the startup time for Java applications, and also reduces memory footprint. When the JRE is installed, the installer loads a set of classes from the system jar file (the jar file containing all the Java class library, called rt.jar) into a private internal representation, and dumps that representation to a file, called a "shared archive". During subsequent JVM invocations, this shared archive is memory-mapped in, saving the cost of loading those classes and allowing much of the JVM's Metadata for these classes to be shared among multiple JVM processes.[15]

The corresponding improvement for start-up time is more noticeable for small programs.[16]

Sun Java versions performance improvements

Apart from the improvements listed here, each Sun's Java version introduced many performance improvements in the Java API.

JDK 1.1.6

Introduced at the Virtual machine level:

J2SE 1.2

Introduced at the Virtual machine level:

J2SE 1.3

Introduced at the Virtual machine level:

J2SE 1.4

See here, for a Sun overview of performance improvements between 1.3 and 1.4 versions.

Java SE 5.0

Introduced at the Virtual machine level :

See here, for a Sun overview of performance improvements between 1.4 and 5.0 versions.

Java SE 6

Introduced at the Virtual machine level :

Other improvements:

  • Java 2D performance has also improved significantly in Java 6[20]

See also 'Sun overview of performance improvements between Java 5 and Java 6'.[21]

Java SE 6 Update 10

  • Java Quick Starter reduces application start-up time by preloading part of JRE data at OS startup on disk cache.[22]
  • Parts of the platform that are necessary to execute an application accessed from the web when JRE is not installed are now downloaded first. The entire JRE is 12 MB, a typical Swing application only needs to download 4 MB to start. The remaining parts are then downloaded in the background.[23]

Future improvements

Future performance improvements are planned for an update of Java 6 or Java 7:[26]

  • Allow the virtual machine to use both the Client and Server compilers in the same session with a technique called Tiered compilation:[30]
    • The Client would be used at startup (because it is good at startup and for small applications),
    • The Server would be used for long-term running of the application (because it outperforms the Client compiler for this).
  • Replace the existing concurrent low-pause garbage collector (also called CMS or Concurrent Mark-Sweep collector) by a new collector called G1 (or Garbage First) to ensure consistent pauses over time[31][32].

Comparison to other languages

Java is often Just-in-time compiled at runtime by the Java Virtual Machine, but may also be compiled ahead-of-time, just like C or C++. When Just-in-time compiled, its performance is generally: [5]

  • lower than the performance of compiled languages as C or C++, but not significantly for most tasks,
  • close to other Just-in-time compiled languages such as C#,
  • much better than languages without an effective native-code compiler (JIT or AOT), such as Perl, Ruby, PHP and Python.[33]

Program speed

The average performance of Java programs has increased a lot over time, and Java's speed might now be comparable with C or C++. In some cases Java is significantly slower, in others, significantly faster.[34] As of March 2009, Java is between 5-15% slower than C and C++ in performance on the Computer Language Benchmarks Game set of benchmarks.

It must also be said that benchmarks often measure performance for small numerically-intensive programs. This arguably favours C. In some real life programs, Java out-performs C, and often there is no performance difference at all. One example is the benchmark of Jake2 (a clone of Quake 2 written in Java by translating the original GPL C code). The Java 5.0 version performs better in some hardware configurations than its C counterpart[35]. While it's not specified how the data was measured (for example if the original Quake 2 executable compiled in 1997 was used, which may be considered bad as current C compilers could achieve better optimizations), it notes how the same Java source code can have a huge speed boost just by updating the VM, something impossible to achieve with a 100% static approach.

Also some optimizations that are possible in Java and similar languages are not possible in C or C++:[34]

  • C-style pointers make optimization hard in languages that support them.
  • Adaptive optimization is impossible in fully compiled code, as the code is compiled once before any program execution, and thus can not take advantage of the architecture and the code path. Some benchmarks show that performance of compiled C or C++ programs are very much dependent on the compatibility of the compilation options on the processor architecture (SSE2, for example), although Java programs are JIT-compiled and adapt on the fly to any given architecture.[36]
  • Escape analysis techniques can not be used in C++ for example, because the compiler can not know where an Object will be used (also because of pointers).

However, results for microbenchmarks between Java and C or C++ highly depend on which operations are compared. For example, when comparing with Java 5.0:

Startup time

Java startup time is often much slower than for C or C++, because a lot of classes (and first of all classes from the platform Class libraries) must be loaded before being used.

It seems that much of the startup time is due to IO-bound operations rather than JVM initialization or class loading (the rt.jar class data file alone is 40 MB and the JVM must seek a lot of data in this huge file).[22] Some tests showed that although the new Split bytecode verification technique improved class loading by roughly 40%, it only translated to about 5% startup improvement for large programs.[46]

Albeit a small improvement it is more visible in small programs that perform a simple operation and then exit, because the Java platform data loading can represent many times the load of the actual program's operation.

Beginning with Java SE 6 Update 10, the Sun JRE comes with a Quick Starter that preloads class data at OS startup to get data from the disk cache rather than from the disk.

Excelsior JET approaches the problem from the other side. Its Startup Optimizer reduces the amount of data that must be read from the disk on application startup, and makes the reads more sequential.

Memory usage

Java memory usage is heavier than for C or C++, because:

  • there is a 12-byte overhead for each object in Java. This means an object containing a single 4-byte integer requires 16 bytes. However, C++ also allocates a 4-byte pointer for every object that declares virtual functions [6].
  • parts of the Java Library must be loaded prior to the program execution (at least the classes that are used "under the hood" by the program) [7]
  • both the Java binary and native recompilations will typically be in memory at once, and
  • the virtual machine itself consumes memory.
  • in Java, a composite object (class A which uses instances of B and C) is created using references to allocated instances of B and C. In C++ the cost of the references can be avoided.

Trigonometric functions

Performance of trigonometric functions can be bad compared to C, because Java has strict specifications for the results of mathematical operations, which may not correspond to the underlying hardware implementation.[47] On the x87 sine and cosine instructions for arguments with absolute value greater than /4 are not accurate, because they are computed by reducing them to this range using an approximation of .[48] A JVM implementation must perform an accurate reduction in software instead, causing a big performance hit for values outside the range.[49]

Java Native Interface

The Java Native Interface has a high overhead associated with it, making it costly to cross the boundary between code running on the JVM and native code.[50][51]

User interface

Swing has been perceived as slower than native widget toolkits, because it delegates the rendering of widgets to the pure Java Java 2D API. However, benchmarks comparing the performance of Swing versus the Standard Widget Toolkit, which delegates the rendering to the native GUI libraries of the operating system, show no clear winner, and the results greatly depend on the context and the environments.[52]

Use for High Performance Computing

Recent independent studies seem to show that Java performance for High Performance Computing (HPC) is similar to Fortran on computation intensive benchmarks, but that JVMs still have scalability issues for performing intensive communication on a Grid Network[53].

However, High Performance Computing applications written in Java have recently won benchmark competitions. In 2008, Apache Hadoop, an open source High Performance Computing project written in Java was able to sort a terabyte of integers the fastest.[54]

Notes

  1. ^ Jelovic, Dejan. "Why Java Will Always Be Slower than C++". Retrieved 2008-02-15.
  2. ^ "Symantec's Just-In-Time Java Compiler To Be Integrated Into Sun JDK 1.1".
  3. ^ "Apple Licenses Symantec's Just In Time (JIT) Compiler To Accelerate Mac OS Runtime For Java".
  4. ^ "Java gets four times faster with new Symantec just-in-time compiler".
  5. ^ Kawaguchi, Kohsuke (2008-03-30). "Deep dive into assembly code from Java". Retrieved 2008-04-02. {{cite web}}: Check date values in: |date= (help)
  6. ^ "Fast, Effective Code Generation in a Just-In-Time Java Compiler" (PDF). Intel Corporation. Retrieved 2007-06-22.
  7. ^ This article shows that the performance gain between interpreted mode and Hotspot is of more than a factor 10.
  8. ^ "The Java HotSpot Virtual Machine, v1.4.1". Sun Microsystems. Retrieved 2008-04-20.
  9. ^ Nutter, Charles (2008-01-28). "Lang.NET 2008: Day 1 Thoughts". Retrieved 2008-04-20. Deoptimization is very exciting when dealing with performance concerns, since it means you can make much more aggressive optimizations...knowing you'll be able to fall back on a tried and true safe path later on {{cite web}}: Check date values in: |date= (help)
  10. ^ IBM DeveleporWorks Library
  11. ^ For example, the duration of pauses is less noticeable now. See for example this clone of Quake 2 written in Java: Jake2.
  12. ^ New Java SE 6 Feature: Type Checking Verifier at java.net
  13. ^ Bug report: new register allocator, fixed in Mustang (JDK 6) b59
  14. ^ Mustang's HotSpot Client gets 58% faster! in Osvaldo Pinali Doederlein's Blog at java.net
  15. ^ Class Data Sharing at java.sun.com
  16. ^ Class Data Sharing in JDK 1.5.0 in Java Buzz Forum at artima developer
  17. ^ "Symantec's Just-In-Time Java Compiler To Be Integrated Into Sun JDK 1.1".
  18. ^ "Java gets four times faster with new Symantec just-in-time compiler".
  19. ^ STR-Crazier: Performance Improvements in Mustang in Chris Campbell's Blog at java.net
  20. ^ See here for a benchmark showing an approximately 60% performance boost from Java 5.0 to 6 for the application JFreeChart
  21. ^ Java SE 6 Performance White Paper at http://java.sun.com
  22. ^ a b Haase, Chet (May 2007). "Consumer JRE: Leaner, Meaner Java Technology". Sun Microsystems. Retrieved 2007-07-27. At the OS level, all of these megabytes have to be read from disk, which is a very slow operation. Actually, it's the seek time of the disk that's the killer; reading large files sequentially is relatively fast, but seeking the bits that we actually need is not. So even though we only need a small fraction of the data in these large files for any particular application, the fact that we're seeking all over within the files means that there is plenty of disk activity.
  23. ^ Haase, Chet (May 2007). "Consumer JRE: Leaner, Meaner Java Technology". Sun Microsystems. Retrieved 2007-07-27.
  24. ^ Haase, Chet (May 2007). "Consumer JRE: Leaner, Meaner Java Technology". Sun Microsystems. Retrieved 2007-07-27.
  25. ^ Campbell, Chris (2007-04-07). "Faster Java 2D Via Shaders". Retrieved 2008-04-26. {{cite web}}: Check date values in: |date= (help)
  26. ^ Haase, Chet (May 2007). "Consumer JRE: Leaner, Meaner Java Technology". Sun Microsystems. Retrieved 2007-07-27.
  27. ^ "JSR 292: Supporting Dynamically Typed Languages on the Java Platform". jcp.org. Retrieved 2008-05-28.
  28. ^ Goetz, Brian (2008-03-04). "Java theory and practice: Stick a fork in it, Part 2". Retrieved 2008-03-09. {{cite web}}: Check date values in: |date= (help)
  29. ^ Lorimer, R.J. (2008-03-21). "Parallelism with Fork/Join in Java 7". infoq.com. Retrieved 2008-05-28. {{cite web}}: Check date values in: |date= (help)
  30. ^ "New Compiler Optimizations in the Java HotSpot Virtual Machine" (PDF). Sun Microsystems. May 2006. Retrieved 2008-05-30.
  31. ^ Humble, Charles (2008-05-13). "JavaOne: Garbage First". infoq.com. Retrieved 2008-09-07. {{cite web}}: Check date values in: |date= (help)
  32. ^ Coward, Dany (2008-11-12). "Java VM: Trying a new Garbage Collector for JDK 7". Retrieved 2008-11-15. {{cite web}}: Check date values in: |date= (help)
  33. ^ Python has Psyco, but the code it can handle is limited, and even with Psyco, its performance is much lower than Java (see the Shootout here)
  34. ^ a b Lewis, J.P. "Performance of Java versus C++". Computer Graphics and Immersive Technology Lab, University of Southern California. {{cite web}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  35. ^ : 260/250 frame/s versus 245 frame/s (see benchmark)
  36. ^ "mandelbrot benchmark". Computer Language Benchmarks Game. Retrieved 2008-02-16.
  37. ^ "Microbenchmarking C++, C#, and Java: 32-bit integer arithmetic". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17. {{cite web}}: Check date values in: |date= (help)
  38. ^ "Microbenchmarking C++, C#, and Java: 64-bit double arithmetic". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17. {{cite web}}: Check date values in: |date= (help)
  39. ^ "Microbenchmarking C++, C#, and Java: File I/O". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17. {{cite web}}: Check date values in: |date= (help)
  40. ^ "Microbenchmarking C++, C#, and Java: Exception". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17. {{cite web}}: Check date values in: |date= (help)
  41. ^ "Microbenchmarking C++, C#, and Java: Single Hash Map". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17. {{cite web}}: Check date values in: |date= (help)
  42. ^ "Microbenchmarking C++, C#, and Java: Multiple Hash Map". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17. {{cite web}}: Check date values in: |date= (help)
  43. ^ "Microbenchmarking C++, C#, and Java: Object creation/ destruction and method call". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17. {{cite web}}: Check date values in: |date= (help)
  44. ^ "Microbenchmarking C++, C#, and Java: Array". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17. {{cite web}}: Check date values in: |date= (help)
  45. ^ "Microbenchmarking C++, C#, and Java: Trigonometric functions". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17. {{cite web}}: Check date values in: |date= (help)
  46. ^ "How fast is the new verifier?". 2006-02-07. Retrieved 2007-05-09. {{cite web}}: Check date values in: |date= (help)
  47. ^ "Math (Java Platform SE 6)". Sun Microsystems. Retrieved 2008-06-08.
  48. ^ Gosling, James (2005-07-27). "Transcendental Meditation". Retrieved 2008-06-08.
  49. ^ W. Cowell-Shah, Christopher (2004-01-08). "Nine Language Performance Round-up: Benchmarking Math & File I/O". Retrieved 2008-06-08.
  50. ^ Wilson, Steve (2001). "JavaTM Platform Performance: Using Native Code". Sun Microsystems. Retrieved 2008-02-15. {{cite web}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  51. ^ Kurzyniec, Dawid. "Efficient Cooperation between Java and Native Codes - JNI Performance Benchmark" (PDF). Retrieved 2008-02-15. {{cite web}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  52. ^ Igor, Križnar (2005-05-10). "SWT Vs. Swing Performance Comparison" (PDF). cosylab.com. Retrieved 2008-05-24. Initial expectation before performing this benchmark was to find SWT outperform Swing. This expectation stemmed from greater responsiveness of SWT-based Java applications (e.g., Eclipse IDE) compared to Swing-based applications. However, this expectation could not be quantitatively confirmed. {{cite web}}: Check date values in: |date= (help)
  53. ^ Brian Amedro, Vladimir Bodnartchouk, Denis Caromel, Christian Delbe, Fabrice Huet, Guillermo L. Taboada (August 2008). "Current State of Java for HPC". INRIA. Retrieved 2008-09-04. We first perform some micro benchmarks for various JVMs, showing the overall good performance for basic arithmetic operations(...). Comparing this implementation with a Fortran/MPI one, we show that they have similar performance on computation intensive benchmarks, but still have scalability issues when performing intensive communications. {{cite web}}: Check date values in: |date= (help)CS1 maint: multiple names: authors list (link)
  54. ^ Owen O'Malley - Yahoo! Grid Computing Team (July 2008). "Apache Hadoop Wins Terabyte Sort Benchmark". Retrieved 2008-12-21. This is the first time that either a Java or an open source program has won.

See also