Under the Hood of J2EE Clustering

Sky-Tiger · 发表于 2008-11-15 08:44

EJB clustering implementation
EJB is an important part of J2EE technology and EJB clustering is the biggest challenge when implement J2EE clustering.

EJB technology is born for distributed computing. They can be running in independent servers. Web server components or rich clients can access the EJBs from other machines through standard protocol (RMI/IIOP). You can invoke methods on remote EJB just as you would invoke a method on any local Java object. In fact, RMI-IIOP completely masks whether the object you’re invoking on is local or remote. This is called local/remote transparency.

Sky-Tiger · 发表于 2008-11-15 08:45

The above figure shows the mechanism of invoking remote EJB. When a client wants to use an EJB, it cannot invoke the EJB directly. Instead, the client can only invoke a local object called stub, which acts as a proxy to the remote object and has the same interface as the remote one. The stub is responsible for accepting method calls locally and delegating those method calls to the remote EJBs across the network. Stubs are running within the client JVM, and know how to look over the network for the real object through RMI/IIOP. For detail information about EJB, please refer to “http://java.sun.com/products/ejb/”.

Let’s look at how we use EJB in our J2EE code to explain the implementation of EJB Clustering. To make a call to an EJB, you should

Look up the EJBHome stub from a JNDI server.
looks up or create an EJB object using the EJBHome stub; an EJBObject stub returns.
makes method calls against the EJB using the EJBObject stub
Load balancing and failover can happen during JNDI lookup (the 1 st step), which I have already mentioned it in last section. During methods call to EJB stubs (include EJBHome and EJBObject), vendors implement EJB load balancing and failover in following three different ways.

Smart stub
As you know, client can access the remote EJB through a stub object, this object can be retrieved from a JNDI tree, and it is even possible that clients download the classfile of the stub from any web server transparently. So the stub has the following features:

It can be generated dynamically and programmatically at runtime and the definition of the stub (the classfile) does not necessary needs to be in the classpath of client environment or part of the client libraries (JAR) at runtime (as it can be downloaded).

Sky-Tiger · 发表于 2008-11-15 08:45

Shown as figure 17, BEA Weblogic and JBoss implement EJB clustering by incorporating some specific behavior, in the stub code, that will transparently run on the client side (the client code doesn’t even know about this code). This technique is called “smart stub”.

The smart stub is really smart that it can contain the list of target instances it can access, it can detect any failure about the target instances, and it also contains complex load-balancing and fail-over logic to dispatch requests to the targets. Furthermore, if the cluster topology changes (for example: new instances added in or removed off), the stub can update itself of the target list to reflect the new topology without manually reconfiguration.

Put the clustering implementation in the stub has following advantages:

Since EJB stub is running inside the client environment, it will save a lot of resources in the server side.
The load balancer is incorporated in the client code and is highly related with client life cycles. This will eliminate single point of failure of load balancer. If the load balancer dies, it most probably means that the client code is also dead, which is acceptable.
The stub can be downloaded dynamically and update itself automatically. That means zero maintenance.
IIOP Runtime Library
Sun JES Application Server implements EJB clustering in another way. The load balancing and failover logic are implemented in the IIOP runtime library. For example, JES has modified the “ORBSocketFactory” implementation to let it be cluster-aware, shown as figure 18.

Sky-Tiger · 发表于 2008-11-15 08:45

The modified version of “ORBSocketFactory” has all the logics and algorithms to perform load balancing and failover, which will keep the stub small and clean. Since the implementation is in the runtime library, it can get system resources more easily than stub does. But this approach always requires the specific runtime library in the client side, which will make some troubles when interoperating with other J2EE products.

Interceptor Proxy
IBM Websphere employs a Location Service Daemon (LSD), which acts as a interceptor proxy to EJB clients, shown as figure 19.

Sky-Tiger · 发表于 2008-11-15 08:45

Within this approach, a client obtains a stub by looking up from JNDI. The stub contains routing information to the LSD rather than to the application server which hosts EJBs. Then the LSD receives all coming requests and determines where to send them to different instances based on the load balancing and failover policy. This approach will add extra administration works to install and maintain the cluster.

Clustering support for EJBs
To invoke a method of an EJB, two types of stub objects are involved: one for the EJBHome interface and one for the EJBObject interface. This means that EJBs can potentially realize the load balancing and failover on two levels:

When a client create or looks up an EJB object using the EJBHome stub
When a client makes method calls against the EJB using the EJBObject stub
Clustering support for EJBHome Stub
The EJBHome Interface is used to create or lookup EJB instances in the EJB container and EJBHome Stub is the client agent for EJBHome Interface. EJBHome interface will not maintain any state information for the client. So, the same EJBHome interface from different EJB containers is identical for the client. When the client issues a create() or find() call, the home stub selects a server from the replica list in accordance with the load balancing and failover algorithm, and routes the call to the home interface on that server.

Clustering support for EJBObject Stub
When an EJBHome interface creates an EJB instance, it returns an EJBObject stub to the client to let the user make business methods call to the EJB. The system already has a list of all of the available servers in the cluster, to which the beans are deployed, but it cannot route the calls issued by the EJBObject stub to the EJBObject interface on arbitrary server instance, depend on the EJB type.

Stateless session bean is most probably the easiest case: as no state is involved, All EJB instances are considered identical. So the method invoking from EJBObject can be load-balanced or failed over on any participating server instances.

Stateful Session Beans are clustered a bit differently from the stateless beans. As you know, Stateful Session Beans will hold session state information for a client in successive requests. Technically, clustering of Stateful Session Beans is the same as clustering of HTTPSession. At normal time, the EJBObject stub will not load balance the requests to different server instances. Instead, it will stick to the instance where the EJBObject is created at first time; we call this instance the “primary instance”. During processing, the state information will backup from the primary instance to other servers. If the primary instance fails, other backup servers will take over.

Entity Beans are stateless essentially, although they can process Stateful requests. All the information data are backed up into database by the mechanism of Entity Bean itself. It seems that for Entity Beans, load balancing and failover can be achieved easily just like Stateless Session Bean. But actually, Entity Beans are not load balanced and fail-overed most of time. As suggested by design patterns, entity beans are always wrapped by a session bean façade. Therefore, most access to entity EJBs should occur over local interfaces by in-process session beans, rather than remote clients. This will make load balancing and failover become no sense.

Sky-Tiger · 发表于 2008-11-15 08:46

Clustering support for JMS and database connection
There are other distributed objects in J2EE in addition to JSP, Servlet, JNDI and EJB. These objects may or may not be supported in a cluster implementation.

Currently, some database products, such as Oracle RAC, support clustering environment and can deploy multi replicated, synchronized database instances. However, JDBC is a highly stateful protocol and in which transactional state are tied directly to the socket between the server and the client, so it is hard to achieve clustering. If a JDBC connection dies, a ll JDBC objects associated with the dead connection will also be defunct. The re-connection action is needed in the client code. BEA Weblogic uses a JDBC multipool to eases the reconnection process.

JMS is supported in most of J2EE servers, but not fully supported. Load balancing and failover is implemented only for JMS broker, and few products have the failover functions for messages in JMS Destinations.

Myths about J2EE clustering
Failover can avoid errors completely. -- Negative
In the document of JBoss, there is a whole section to warn you “do you really need HTTP sessions replication?” Yes, sometime a high availability solution without failover is acceptable and cheap. And further more, the failover feature is not as strong as you expected.

What on earth does failover bring to you? Some of you may think that failover can avoid errors. You see, without session failover, session data is lost when a server fails and causes errors; while with session failover, sessions can be restored from the backup and requests can be processed by another server instance, the client even isn't aware of the failure. That may be true, but it’s conditional!

Remind that when I defined “failover”, I defined a condition for when the failover will happen: “between the method calls”. It means if you have two successive methods to a remote object, the failover will only happen after the first method call is finished successfully and before the second method request is sent out.

So, what will happen if the remote server fails when the methods are in the middle of processing in the server? The answer is: the process will stop and the client will see error messages in most cases, unless the methods are idempotent (defined in the “basic terminology” section).Only if the methods are idempotent, some load balancers are smart enough to retry these methods and failover them to other instances.

Why is “idempotency” important? Because the client never knows where the execution the request was in when the failure occurred. Has the method just been initiated or it is almost finished? A client can never determine it! If the method is not idempotent, two invokings of the same method will alter the system state twice and the system will be in an inconsistent situation.

You might think that all methods that are placed in a transaction are idempotent. After all, if failure happens, the transaction will roll back, and all transactional state changes will be undone. But the truth is that the transaction boundary may not include all the edges of remote methods invoking. What if the transaction commits on the server and the network crashes on the return trip to the client. The client would not know whether the server’s transaction succeeded or not.

In serious applications, to make all the methods idempotent is impossible. So, you can only reduce errors by failover, but not avoid them! Take an online store website for example, suppose every server instance will handle 100 online users’ request at any time. When one server fails, the solution without session failover will lose all the users’ session data and anger all the 100 users; while in the solution with session failover, only 20 users’ requests are in processing when the server fails and only these users are angry about the failure. All the other 80 users are just in the thinking time or between the methods. These users get their session failed over transparently. So, you should trade off from following considerations:

The different impact between anger 20 users and 100 users.
The different cost between products with failover and without failover.

Sky-Tiger · 发表于 2008-11-15 08:46

Stand-alone applications can be transmit transparently to a cluster structure. -- Negative!
Although some vendors announce such flexibility for their J2EE products, don’t trust them! Actually, you should prepare for the clustering at the beginning of system design and impact all the phases including development and testing.

Http Session
In a cluster environment, there are many restrictions to HTTPSession usage as I mentioned before, depending on different mechanism your application server uses for session failover. The first important restriction is that all objects stored in the HTTPSession must be serializable which will limit the design and structure of the application. Some design patterns and MVC framework will use HTTPSession to store some non-serializable objects (such as Servlet context, Local EJB Interface and web services reference), such designs cannot work in a cluster. Secondly, object serialization and de-serialization are very costly in performance especially in the database persistent approach. In such environment, storing large or numerous objects in the session should be avoided. If you have chosen a memory replication approach, be careful about the limitation on cross-referenced attributes in HTTPSession as I mentioned before. Another major difference in cluster environment is you are required to call “setAttribute ()” method whenever any attribute under HTTPSession is modified. This method is optional in stand alone system. The purpose of this method is to separate modified attributes from those untouched, so that the system can backup only necessary data for session failover to improve performance.

Cache
Almost every J2EE project I experienced used object caching to improve performance, and all popular application servers provide extra degrees of caching to enable faster applications. But these caches are typically designed for a standalone environment, and can only work within one JVM instance. We need cache because some objects are so heavy that creating a new one will cost much. So we maintain an object pool to reuse the object instances without further creation. We gain performance only if the maintenance of the cache is cheaper than objects creation. In a clustered environment, each JVM instance will maintain its own copy of the cache, which should be synchronized with others to provide inconsistent state in all server instances. Sometimes this kind of sync will bring worse performance than no caching at all.

Static variables
When design J2EE applications, design patterns are popular among architects. Some design pattern such as “Singleton” will use a static variable to share a state among many objects. This approach works well on a single server, but fails in a cluster. Each instance in the cluster would maintain its own copy of the static variable in its own JVM instance, thereby breaking the mechanism of the pattern. One example for the usage of static variable is to keep statistics about total number of online users. One easy way is to store the number in a static variable, increasing and decreasing it when users are in and out. This application works absolutely fine on a single server, but fails on a cluster. A preferable way workable with cluster is to store all state data to a database.

External resource
Although not recommended by the J2EE specification, the external I/O operations are used for various purposes. For example, some applications use file systems to save uploading files by users, or create dynamic configuration XML files. In a cluster the application server has no way of replicating these files across to other instances. To work in a cluster, the solution is to use the database in place of external files, if possible. One could also choose SAN as central deposits for files.

Special Services
There are some special services which only make sense in the stand-alone mode. Timer services are good examples of such services, which will happen regularly and based on constant interval. Timer services are often used to perform administrative tasks automatically, such as logging file rotation, system data backup, database consistence checking and redundant data cleaning up. Some event based services are also hard to migrate to a cluster environment. The initial services are good examples which will happen only at the beginning of whole system. Email notification services are also such examples which are trigged by some warning conditions.

These services are trigged by events instead of requests, and should only be executed only once. Such services will make the load balancing and failover make little sense in a cluster.

Some products have prepared for such services. For example, JBoss uses “clustered singleton facility” to coordinate all the instances to guarantee to execute these services once and only once. Based on your product platform you choose, those special services may be an obstacle to migrate your applications to a cluster structure.

Sky-Tiger · 发表于 2008-11-15 08:46

Distributed structure is more flexible than collocated one? -- Maybe Not!
J2EE technology, especially EJB, is born for distributed computing. Decoupled business functionality, reused remote components make multi-tier applications popular. But we won’t make everything distributed. Some J2EE architects think it better that the Web tier and EJB tier should be collocated closely. These kinds of discussion are still going on. Let me explain more.

Shown as figure 18, it is a distributed structure. When requests come, load balancer will dispatch them to different web containers in different servers. If the requests include EJB invokes, the web container will re-dispatch the EJB invokes to different EJB containers. Such, requests are load balanced and failed over twice.

Some people look down on the distributed structure. They have pointed out:

The second load balancing is not necessary, because it cannot assign tasks more evenly. Every server instance will has its own web container and EJB container. To make the EJB container to process requests from other instance’s web container shows no advantages compared to inner invoking happened inside server instances.
The second failover is not necessary, because it cannot improve availability. Most vendors implement their J2EE servers in such a way that web container and EJB container within the same server share the same JVM instance. If the EJB container fails, under most circumstances, the Web container in the same JVM instance will also fail at the same time.
Performance will degrade. Imagine now in one method of your application you may be invoking a couple of EJBs, if you load balance on every one of those EJBs, you're going to end up with instances of your application spread out across the multi server instances; you're going to have a lot of server-to-server cross talk that's unnecessary. And more, if your method is under a transaction, your transaction boundary will include many server instances which will impact performance heavily.

Sky-Tiger · 发表于 2008-11-15 08:47

At the runtime actually, most vendors (include Sun JES, Weblogic and JBoss) will optimize the EJB load balancing mechanism to let requests first choose the EJB container in the same server. In this way, shown as figure 19, we load balance only at the first level of requests (web container), and then have subsequent services end up on that same server. This structure is called collocated structure. Technically, collocated structure is the special case of distributed one.

One interesting question is that, since most deployment are evolved as collocated structure at the runtime, why not use local interface instead of remote interface, and this will improve performance quite a bit. Of course you can. But remember, when using local interface, Web components and EJB are coupled tightly, and make method invoking directly instead of IIOP/RMI. The load balancer and failover dispatcher have no chance to intervene with local interface call, the “Web+EJB” process is load balanced and failover as a whole.

But unfortunately, using local interface in a cluster has some limitations on most J2EE servers. EJBs are local objects with local interfaces, but they are not Serializable. So the limit is that the local references are not allowed to be stored in HTTPSession. Some products, such as Sun JES, treat local EJBs differently and make them Serializable and can be used in HTTPSession as you will.

Another interesting question is: Since collocated structure is popular and has good performance, why we need distributed structure? Like in most cases, things happen for a reason. Sometime, the distributed structure is not replaceable:

EJB is not only for web container, rich clients are also the consumers.
EJB components and Web components may be in different security levels, and need to be separated physically. So, firewall can be setup to protect the most important machines on which EJB components are running.
Extreme unsymmetricalness between Web and EJB tier will make the distributed structure a better choice. For example, some EJB components are so complex and resource consuming, that they can only run in some expensive big servers; on the other hand, the Web components (html, JSP and Servlet) are simple enough that cheap PC servers will be satisfied. Under such condition, dedicated Web servers will be used to accept client connection requests, and serve static data (HTML and images) and simple Web components (JSP and Servlet) very quickly. The big servers are only used for complex computing and make the best use of the investment.
Conclusion
Clustering is different from the stand-alone environment. J2EE vendors implement clustering differently. You should prepare for J2EE clustering at the beginning of your projects in order to build a large scale system. Choose proper J2EE product which is well suitable to your requirements. Choose proper third-party software and frameworks to make sure they are cluster-aware too. Then, design proper architectures which will really benefit from clustering instead of suffering.

dwhvan · 发表于 2008-11-25 16:53

在theserverside看过这篇文章。。