5 Article Series on Reloading Java Classes

Sky-Tiger · 发表于 2010-3-12 23:44

Redeployment

To make use of dynamic classloaders we must first create them. When deploying your application, the server will create one classloader for each application (and each application module in the case of an enterprise application). The classloaders form a hierarchy as illustrated:

classloaders-jee

Sky-Tiger · 发表于 2010-3-12 23:45

In Tomcat each .WAR application is managed by an instance of the StandardContext class that creates an instance of WebappClassLoader used to load the web application classes. When a user presses “reload” in the Tomcat Manager the following will happen:

* StandardContext.reload() method is called
* The previous WebappClassLoader instance is replaced with a new one
* All reference to servlets are dropped
* New servlets are created
* Servlet.init() is called on them

tomcat-cl-reload

Calling Servlet.init() recreates the “initialized” application state with the updated classes loaded using the new classloader instance. The main problem with this approach is that to recreate the “initialized” state we run the initialization from scratch, which usually includes loading and processing metadata/configuration, warming up caches, running all kinds of checks and so on. In a sufficiently large application this can take many minutes, but in a in small application this often takes just a few seconds and is fast enough to seem instant, as commonly demonstrated in the Glassfish v3 promotional demos.

If your application is deployed as an .EAR archive, many servers allow you to also redeploy each application module separately, when it is updated. This saves you the time you would otherwise spend waiting for non-updated modules to reinitialize after the redeployment.

Sky-Tiger · 发表于 2010-3-12 23:45

Hot Deployment

Web containers commonly have a special directory (e.g. “webapps” in Tomcat, “deploy” in JBoss) that is periodically scanned for new web applications or changes to the existing ones. When the scanner detects that a deployed .WAR is updated, the scanner causes a redeploy to happen (in Tomcat it calls the StandardContext.reload() method). Since this happens without any additional action on the user’s side it is commonly referred to “Hot Deployment”.

Hot Deployment is supported by all wide-spread application servers under different names: autodeployment, rapid deployment, autopublishing, hot reload, and so on. In some containers, instead of moving the archive to a predefined directory you can configure the server to monitor the archive at a specific path. Often the redeployment can be triggered from the IDE (e.g. when the user saves a file) thus reloading the application without any additional user involvement. Although the application is reloaded transparently to the user, it still takes the same amount of time as when hitting the “Reload” button in the admin console, so code changes are not immediately visible in the browser, for example.

Another problem with redeployment in general and hot deployment in particular is classloader leaks. As we reviewed in Reloading Java Classes 201, it is amazingly easy to leak a classloader and quickly run out of heap causing an OutOfMemoryError. As each deployment creates new classloaders, it is common to run out of memory in just a few redeploys on a large enough application (whether in development or in production).

Sky-Tiger · 发表于 2010-3-12 23:45

Exploded Deployment

An additional feature supported by the majority of web containers is the so called “exploded deployment”, also known as “unpackaged” or “directory” deployment. Instead of deploying a .WAR archive, one can deploy a directory with exactly the same layout as the .WAR archive:

exploded

Why bother? Well, packaging an archive is an expensive operation, so deploying the directory can save quite a bit of time during build. Moreover, it is often possible to set up the project directory with exactly the same layout as the .WAR archive. This means an added benefit of editing files in place, instead of copying them to the server. Unfortunately, as Java classes cannot be reloaded without a redeploy, changing a .java file still means waiting for the application to reinitialize.

With some servers it makes sense to find out exactly what triggers the hot redeploy in the exploded directory. Sometimes the redeploy will be triggered only when the “web.xml” timestamp changes, or as in the case of GlassFish only when a special ”.reload” file timestamp changes. In most servers any change to deployment descriptors or compiled classes will cause a hot redeploy.

If your server only supports deploying by copying to a special directory (e.g. Tomcat “webapps”, JBoss “deploy” directories) you can skip the copying by creating a symlink from that special directory to your project workspace. On Linux and Mac OS X you can use the common “ln -s” command to do that, whereas on Windows you should download the Sysinternals “junction” utility.

If you use Maven, then it’s quite complicated to set up exploded development from your workspace. If you have a solo web application you can use the Maven Jetty plugin, which uses classes and resources directly from Maven source and target project directories. Unfortunately, the Maven Jetty plugin does not support deploying multiple web applications, EJB modules or EARs so in the latter case you’re stuck doing artifact builds.

Sky-Tiger · 发表于 2010-3-12 23:46

Session Persistence

Since we’re on the topic of reloading classes, and redeploying involves reinitializing an application, it makes sense to talk about session state. An HTTP session usually holds information like login credentials and conversational state. Losing that session when developing a web application means spending time logging in and browsing to the changes page – something that most web containers have tried to solve by serializing all of the objects in the HttpSession map and then deserializing them in the new classloader. Essentially, they copy all of the session state. This requires that all session attributes implement Serializable (ensuring session attributes can be written to a database or a file for later use), which is not restricting in most cases.

hot-deploy-session

Session persistence has been present in most major containers for many years (e.g. Restart Persistence in Tomcat), but was notoriously absent in Glassfish before v3.

Sky-Tiger · 发表于 2010-3-12 23:47

OSGi

There is a lot of misunderstanding surrounding what exactly OSGi does and doesn’t do. If we ignore the aspects irrelevant to the current issue, OSGi is basically a collection of modules each wrapped in its own classloader, which can be dropped and recreated at will. When it’s recreated, the modules are reinitialized exactly the same way a web application is.

osgi

The difference between OSGi and a web container is that OSGi is something that is exposed to your application, that you use to split your application into arbitrarily small modules. Therefore, by design, these modules will likely be much smaller than the monolithic web applications we are used to building. And since each of these modules is smaller and we can “redeploy” them one-by-one, re-initialization takes less time. The time depends on how you design your application (and can still be significant).

Sky-Tiger · 发表于 2010-3-12 23:47

Tapestry 5, RIFE & Grails

Recently, some web frameworks, such as Tapestry 5, RIFE and Grails, have taken a different approach, taking advantage of the fact that they already need to maintain application state. They’ll ensure that state will be serializable, or otherwise easily re-creatable, so that after dropping a classloader, there is no need to reinitialize anything.

This means that application developers use frameworks’ components and the lifecycle of those components is handled by the framework. The framework will initialize (based on some configuration, either xml or annotation based), run and destroy the components.

As the lifecycle of the components is managed by the framework, it is easy to recreate a component in a new classloader without user intervention and thus create the effect of reloading code. In the background, the old component is destroyed (classloader is dropped) and a new one created (in a new classloader where the classes are read in again) and the old state is either deserialized or created based on the configuration.

component

This has the obvious advantage of being very quick, as components are small and the classloaders are granular. Therefore the code is reloaded instantly, giving a smooth experience in developing the application. However such an approach is not always possible as it requires the component to be completely managed by the framework. It also leads to incompatibilities between the different class versions causing, among others, ClassCastExceptions.

Sky-Tiger · 发表于 2010-3-12 23:48

HotSwap and Instrumentation

In 2002, Sun introduced a new experimental technology into the Java 1.4 JVM, called HotSwap. It was incorporated within the Debugger API, and allowed debuggers to update class bytecode in place, using the same class identity. This meant that all objects could refer to an updated class and execute new code when their methods were called, preventing the need to reload a container whenever class bytecode was changed. All modern IDEs (including Eclipse, IDEA and NetBeans) support it. As of Java 5 this functionality is also available directly to Java applications, through the Instrumentation API.

hotswap

Unfortunately, this redefinition is limited only to changing method bodies — it cannot either add methods or fields or otherwise change anything else, except for the method bodies. This limits the usefulness of HotSwap, and it also suffers from other problems:

* The Java compiler will often create synthetic methods or fields even if you have just changed a method body (e.g. when you add a class literal, anonymous and inner classes, etc).
* Running in debug mode will often slow the application down or introduce other problems

This causes HotSwap to be used less than, perhaps, it should be.

Sky-Tiger · 发表于 2010-3-12 23:48

Why is HotSwap limited to method bodies?

This question has been asked a lot during the almost 10 years since the introduction of HotSwap. One of the most voted for bugs for the JVM calls for supporting a whole array of changes, but so far it has not been implemented.

A disclaimer: I do not claim to be a JVM expert. I have a good general idea how the JVM is implemented and over the years I talked to a few (ex-)Sun engineers, but I haven’t verified everything I’m saying here against the source code. That said, I do have some ideas as to the reasons why this bug is still open (but if you know the reasons better, feel free to correct me).

The JVM is a heavily optimized piece of software, running on multiple platforms. Performance and stability are the highest priorities. To support them in different environments the Sun JVM features:

* Two heavily optimized Just-In-Time compilers (-client and -server)
* Several multi-generational garbage collectors

These features make evolving the class schema a considerable challenge. To understand why, we need to look a little closer as to what exactly is necessary to support adding methods and fields (and even more advanced, changing the inheritance hierarchy).

When loaded into the JVM, an object is represented by a structure in memory, occupying a continuous region of memory with a specific size (its fields plus metadata). In order to add a field, we would need to resize that structure, but since nearby regions may already be occupied, we would need to relocate the whole structure to a different region where there is enough free space to fit it in. Now, since we’re actually updating a class (and not just a single object) we would have to do this to every object of that class.

In itself this would not be hard to achieve — Java garbage collectors already relocate objects all the time. The problem is that the abstraction of one “heap” is just that, an abstraction. The actual layout of memory depends on the garbage collector that is currently active and, to be compatible with all of them, the relocation should probably be delegated to the active garbage collector. The JVM will also need to be suspended for the time of relocation, so doing GC at the same time makes sense.

Adding a method does not require updating the object structure, but it does require updating the class structure, which is also present on the heap. But consider this: the moment after a class has been loaded it is essentially is frozen forever. This enables the JIT to perform the main optimization that the JVM does — inlining. Most of the method calls in your application hot spots are eliminated and the code is copied to the calling method. A simple check is inserted to ensure that the target object is indeed what we think it is.

Here’s the punchline: the moment we can add methods to classes this “simple check” is not enough. We would need a considerably more complicated check that needs to ensure not only that no methods with the same name were added to the target class, but also to all it’s superclasses. Alternatively we could track all the inlined spots and their dependencies and deoptimize them when a class is updated. Either way it has a cost in either performance or complexity.

On top of that, consider that we’re talking about multiple platforms with varying memory models and instructions sets that probably require at least some specific handling and you get yourself an expensive problem with not much return on investment.

Sky-Tiger · 发表于 2010-3-12 23:48

Introducing JRebel

In 2007, ZeroTurnaround announced the availability of a tool called JRebel (then JavaRebel) that could update classes without dynamic class loaders and with very few limitations. Unlike HotSwap, which is dependent on IDE integration, the tool works by monitoring the actual compiled .class files on disk and updating the classes whenever the files are updated. This means that you can use JRebel with a text editor and command-line compiler if so willing. Of course, it’s also integrated neatly into Eclipse, IntelliJ, and NetBeans. Unlike dynamic classloaders, JRebel preserves the identity and state of all existing objects and classes, allowing developers to continue using their application without delay.

jrebel-agent
How does this work?

For starters, JRebel works on a different level of abstraction than HotSwap. Whereas HotSwap works at the virtual machine level and is dependent on the inner workings of the JVM, JRebel makes use of two remarkable features of the JVM — abstract bytecode and classloaders. Classloaders allow JRebel to recognize the moment when a class is loaded, then translate the bytecode on-the-fly to create another layer of abstraction between the virtual machine and the executed code.

Others have used this features to enable profilers, performance monitoring, continuations, software transactional memory and even distributed heap. Combining bytecode abstraction with classloaders is a powerful combination, and can be used to implement a variety of features even more exotic than class reloading. As we examine the issue closer, we’ll see that the challenge is not just in reloading classes, but also doing so without a visible degradation in performance and compatibility.

As we reviewed in Reloading Java Classes 101 the problem in reloading classes is that once a class has been loaded it cannot be unloaded or changed; but we are free to load new classes as we please. To understand how we could theoretically reload classes, let’s take a look at dynamic languages on the Java platform. Specifically, let’s take a look at JRuby (we’ll simplify a lot, so don’t crucify anyone important).

Although JRuby features “classes”, at runtime each object is dynamic and new fields and methods can be added at any moment. This means that a JRuby object is not much more than a Map from method names to their implementations and from field names to their values. The implementations for those methods are contained in anonymously named classes that are generated when the method is encountered. If you add a method, all JRuby has to do is generate a new anonymous class that includes the body of that method. As each anonymous class has a unique name there are no issues loading it and as a result the application is updated on-the-fly.

Theoretically, since bytecode translation is usually used to modify the class bytecode, there is no reason why we can’t use the information in that class and just create as many classes as necessary to fulfill its function. We could then use the same transformation as JRuby and split all Java classes into a holder class and method body classes. Unfortunately, such an approach would be subject to (at least) the following problems:

* Perfomance. Such a setup would mean that each method invocation would be subject to indirection. We could optimize, but the application would be at least an order of magnitude slower. Memory use would also skyrocket, as so many classes are created.
* Java SDK classes. The classes in the Java SDK are considerably harder to process than the ones in the application or libraries. Also they often are implemented in native code and cannot be transformed in the “JRuby” way. However if we leave them as is, then we’ll cause numerous incompatibility errors, which are likely not possible to work around.
* Compatibility. Although Java is a static language it includes some dynamic features like reflection and dynamic proxies. If we apply the “JRuby” transformation none of those features will work unless we replace the Reflection API with our own classes, aware of the transformation.

Therefore, JRebel does not take such an approach. Instead it uses a much more complicated approach, based on advanced compilation techniques, that leaves us with one master class and several anonymous support classes backed by the JIT transformation runtime that allow modifications to take place without any visible degradation in performance or compatibility. It also

* Leaves as many method invocations intact as possible. This means that JRebel minimizes its performance overhead, making it lightweight.
* Avoids instrumenting the Java SDK except in a few places that are necessary to preserve compatibility.
* Tweaks the results of the Reflection API, so that we can correctly include the added/removed members in these results. This also means that the changes to Annotations are visible to the application.

5 Article Series on Reloading Java Classes

浏览过的版块