Garbage Collection in JVM
What is garbage collection and different algorithms to do garbage collection.
Garbage collection is the single most important factor when tuning a JVM for long-running, server side applications. Improperly tuned garbage collectors or applications that create unnecessarily large numbers of objects can significantly affect the efficiency of the application. It is not uncommon to find that garbage collection consumes a significant amount of the overall processing time in a server-side Java application. Proper tuning of the garbage collector can significantly reduce the garbage collector’s processing time and, therefore, can significantly improve the application’s throughput.Garbage collection is the technique a JVM uses to free memory occupied by objects that are no longer being used by the application.
An object is considered garbage when it can no longer be reached from any pointer in the running program. The most straightforward garbage collection algorithms simply iterate over every reachable object. Any objects left over are then considered garbage. The time this approach takes is proportional to the number of live objects, which is prohibitive for large applications maintaining lots of live data. Garbage collectors determine whether an object is eligible for collection by determining whether objects are being referenced by any active objects in the system. The garbage collector must first identify the objects eligible for collection. Two general approaches for this are:
- Reference counting: Involves storing a count of all of the references to a particular object. This means that the JVM must properly increment and decrement the reference count as the application creates references and as the references go out of scope. When an object’s reference count goes to zero, it is eligible for garbage collection
- Object reference traversal: Starts with a set of root objects and follows every link recursively through the entire object graph to determine the set of reachable objects. Any object that is not reachable from at least one of these root objects is garbage collected. During this object traversal stage, the garbage collector must remember which objects are reachable so that it can remove those that are not; this is known as marking the object.
Garbage collectors usually have to stop all other activity for some portion of the garbage collection process. This stop-the-world approach means all application-related work stops while the garbage collector runs. As a result, any in-flight requests will experience an increase in their response time by the amount of time taken by the garbage collector. Other, more sophisticated collectors run either incrementally or truly concurrently to reduce or eliminate the application pauses. Some garbage collectors use a single thread to do their work; others employ multiple threads to increase their efficiency on multi-CPU machines. Some garbage collectors used by modern JVMs are:
- Mark-and-sweep collector – traverses the object graph and marks reachable objects. Next, the heap is scanned for unmarked objects and their memory added to a list of available memory segments. This collector uses a single thread to do its work and is a stop-the-world collector.
- Mark-sweep-compact collector – uses the same marking phase as a mark-and-sweep collector. During the second phase, the heap is compacted by copying marked objects to a new area of the heap. These collectors are stop-the-world collectors.
- Copying collector – divides the heap into two areas. Only one area at a time is used. When the garbage collector runs, all reachable objects are copied to the other area, thus compacting the heap as the live objects are copied. All dead objects are left behind. This algorithm works well for short-lived objects, but the expense of continually copying long-lived objects makes it less efficient. This is a stop-the-world collector.
- Incremental collector – divides the heap into multiple areas and collects garbage from only one area at a time. This can create much smaller, though more frequent, pauses in the application. Numerous approaches exist for defining how the actual collection is handled from traditional mark-and-sweep to algorithms designed explicitly for use with multiple smaller areas.
- Generational collector – divides the heap into two or more areas that are used to store objects with different lifetimes. The JVM generally creates all new objects in one of these areas. Over time, the objects that continue to exist get tenured and move into another area for longer-lived objects. Generational collectors often use different algorithms for the different areas to optimize performance.
- Concurrent collector – runs concurrently with the application, typically as one or more background threads. These collectors typically have to stop-the-world at some point to complete certain tasks, but the amount of time they halt all processing is significantly reduced because of their background work.
- Parallel collector – uses one of the traditional algorithms but uses multiple threads to parallelize their work on multiprocessor machines. This can dramatically improve the scalability of an application.
Performance considerations:
There are two primary measures of garbage collection performance:
1.Throughput is the percentage of total time not spent in garbage collection.
2.Pauses are the times when an application appears unresponsive because garbage collection is occurring
Note in order to tune the performance we have to choose one of the two optimizations:
1.Maximize throughput – Java threads get as much as possible CPU resources.
2.Minimise pauses – Response time of the application is maximized.