David Purcell

Subscribe to David Purcell: eMailAlertsEmail Alerts
Get David Purcell: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: Java EE Journal

J2EE Journal: Article

Moving to a Cluster...

What the guidebook doesn't tell you

You've engineered a J2EE application that has become mission critical for your business operations. You know that downtime will be less acceptable as the business starts to rely more on the application, so you want to start eliminating single points of failure and improve availability. One of your first thoughts might be: "Let's move to a clustered application server environment."

Migrating to a clustered environment is a reasonable thing to do, but if this is your first experience with clustering application servers, make sure you understand what you're getting into when you go to your project owner and tell him or her about your plans.

Project Considerations
Why Are You Migrating to a Cluster?
Before anything, make sure you're migrating to a cluster at the right time and for the right reasons. A clustered application server environment can give you failover capabilities, which will help you sleep a little better at night and should help distribute the load a little bit, delaying the point at which your hardware becomes inadequate. If you're squeezed by tight budgets, make sure you're tackling the highest priority issue in your system so you make the most of your limited resources.

What are the consequences of not doing it? Business owners who have lean budgets might need to understand the chances of downtime occurring and its implications. Also, you could consider alternatives, such as a multiserver environment that doesn't involve clustering. Clustering can give you advantages such as:

  • Load distribution
  • High availability
  • Session failover
  • EJB sharing
  • Centralized control
However, there are other ways to distribute the load among servers, and you might not be using EJBs for persistence, so you wouldn't need to take advantage of EJB sharing. Adding other server instances in a nonclustered fashion could give you the high availability you desire. Your decision to go with a cluster might come down to whether or not it is important that all users who had been hitting a server keep their session intact and not notice if that server goes down. The bottom line is that there is a cost-benefit consideration that needs to be made, and, usually, if there are external customers or important business processes involved, it is easy to see how failover and clustering can be important. Just remember that you have options.

In any case, be sure that there is a need and be ready to discuss not only the migration costs, but also some of the ongoing costs described below.

One Change at a Time, Please
Migrating to a cluster isn't always straightforward. Don't make your life more difficult by changing the application's functionality as part of your project. If you are managing a migration project, make sure that business owners don't try to tack on new functionality or other application changes as part of the project. After rollout, you'll need to be able to determine if a problem is related to the new environment. Adding new functionality will make troubleshooting that much more difficult after rollout. Also, you'll want to be able to make a fair comparison between the old and new environments based on metrics and measurements, identifying if performance has improved after the migration; changes in functionality will skew such measurements.

Test the Failover Capabilities
Part of a project plan to deploy the cluster should be to shut down servers in the cluster and see how well the remaining applications perform. Besides verifying the basic failover capabilities, you might uncover some code design points that do not support the cluster.

Be sure to test the system by shutting down each application server node (leaving at least one node up), and also by shutting down individual applications within the node (leaving at least one instance of the application running), and verifying that each combination still results in a working system.

Administration Considerations
Cluster Other Environments, Not Just Production
You might be thinking that since it is the same application, why bother clustering the other environments, such as the certification or staging environments? They don't need the failover capabilities. Think again. First, you'll need to make sure that you have set up the environments and configured the servers properly for the cluster. What better way to start than in an environment that won't become a production environment? More important, however, is that to truly test your application, you'll want your test environment to look just like your production environment. If a problem exists with your setup, you need to find it.

Setting up the certification environment to match the production environment is standard practice for many organizations. However, the extra costs of hardware and server licenses might make departments on a lean budget try to get by without setting up identical environments. In the end, the costs of not doing so will probably catch up with them. On a recent project, for instance, I saw how a machine hosting a directory server masked an underlying problem in the certification environment, as its hardware was different than the production environment's. Although the problem existed in both environments, it manifested itself differently in the certification environment. The problem wasn't noticed until the system was deployed into the production environment, and the trouble that it caused resulted in downtime and significant troubleshooting expenses. Having identical environments will save you time and money in the long run.

Little Things Add Up
Be prepared for a brief period of frustrations while getting to know the new environment. Your cluster's behavior might have some "nuances" that take some getting used to. For example, in a recent migration, we saw an unusual problem appear for the first time as we deployed an application to production. In our case, the two servers in the cluster were producing different results. This wasn't supposed to happen, and we were able to resolve it, but now we know what to do in that unusual circumstance. Little tricks like that are learned only from experience. Expect some initial rough patches.

Also, debugging might become a little more tedious. For example, someone may experience a problem, but you don't know which instance of the server handled the request. When trying to debug an issue in a nonproduction environment, you might want to shut down one instance of the server so you can more quickly focus on the problem.

Don't Let the Redundancy Fool You
Just because you have redundancy built into the system, don't let that fool you into thinking you can be a little more lax with respect to administration practices. Don't take down an instance of the cluster during the day to perform maintenance if you wouldn't have done that before you had the cluster. Of course, large systems with many instances in the cluster are designed to do just that, but small operations need to make sure they don't abuse their new clustering capability. The bottom line is, you probably still need to perform maintenance tasks after hours.

Application Design Considerations
Primary Key Generation
There are many techniques for primary key generation, but not all are cluster-safe. Be sure you understand how your primary key generation works and see if it should be used in a cluster.

Persistence mechanisms, such as Hibernate, should identify which of their primary key generators are safe when used in a cluster. A mechanism we used had a seemingly simple CMP bean implementation of a primary key. Although it appeared to be cluster-safe, the entity bean that kept track of the latest primary key only wrote to the database for every 10 requests it received. Therefore, in a situation with multiple servers, each one eventually got to the tenth item and wrote to the db, overwriting the previous value.

Not Everything Can or Should Take Advantage of Clustering
In a clustered environment, most application servers will share some objects, so it doesn't matter which instance of the application server handles a request. However, some objects are not going to be able to take advantage of clustering. File services or timed tasks, for example, will not take advantage of being shared between instances of the application server, and probably shouldn't.

If you have objects that use timed tasks, such as classes that implement java.util.TimerTask, those tasks will not be shared across the cluster, even if they are associated with objects that are shared across the cluster. The result is that each server will be running its own tasks. So if you have timed tasks and you don't want multiple instances of the task to be running, you need to come up with a way around it. Clark Richey's article, "Clustered Timers" (JDJ, Vol. 9, issue 3, www.sys-con.com/story/?storyid=43944), provides a good discussion on timers in a clustered environment. In our case, we knew that the timers were running in each server, but we implemented a mechanism where they had to check a database record to see if they should continue with their process. Only one instance of the timer would be allowed to perform its process.

Watch Out for File Services
File services also need to be managed carefully if you have a clustered environment. If you have a process that is writing or reading files from a file server, you need to make sure that each instance of the application server can access that same file server location. That means you need to have identical mount points established on each machine. For Windows machines, it's best to use UNC (universal naming convention) when defining a location on a file server, rather than a Windows mapped drive, as the mapped drive only exists when a user is logged on to the machine, which means that it might not always be available.

If you have a situation where you have lots of writes to only one file, you might want to consider having each application server maintain its own instance of the file, and then setting up a nightly cron job to merge the files. If you don't want to mess with that kind of maintenance issue, or the merge won't meet your business needs, consider changing that type of function to use a database instead of writing to a file.

JSP/Servlet Clustering and HttpSession State Failover
One of the primary features of clustered application servers is the ability to cluster servlets and JSPs by sharing HttpSession objects across the cluster. The advantage of this is that if one server in the cluster goes down, the user's HttpSession remains, and the user doesn't notice the difference.

I've already mentioned how some objects can't be clustered, so those objects are certainly ones that you wouldn't want to put into the HttpSession. However, keep in mind that since the HttpSession objects are going to be kept on other servers, the memory requirements on each server might not be reduced much by distributing the load. Usually the HttpSession will be kept in memory on the server that the user first visits, and another server hosts a replica of the HttpSession. If you only have two servers, each server will hold the entire set of all HttpSession state objects, not half of them.

When designing applications for a clustered environment, keep in mind that your session objects should be serializable, you should keep the session objects small, and you shouldn't overuse the HttpSession by keeping everything in it. When your project is being tested, make sure you test the failover capabilities of the servers to verify that the HttpSession is always available and users don't notice when a server is taken offline.

EJB Handles and Distributed Applications
When you had an application deployed in a nonclustered environment, you might have been able to take advantage of knowing where your applications lived. For instance, in a distributed environment, if one application had to call an EJB on another server, the calling application could cache the EJBHome handles in a ServiceLocator, rather than performing a lookup each time the handle was needed. This would give you a slight performance improvement. However, if the EJBs were now distributed among a cluster, that same cache would result in a situation in which an application instance would always call the same server to get the EJB, rather than distribute the requests among the different instances. The caching would result in nullifying some of the advantages of the cluster. To complicate the matter, the application would appear to work fine, so you might not be able to see the problem until one instance of your cluster had to shut down.

EJB containers on many of the major application servers provide mechanisms to handle distributing the load among clusters. BEA's WebLogic server, for instance, allows you to specify in the deployment descriptors that an EJB is clusterable. In that event, the EJBHome stub, and possibly the EJBObject remote stub, are aware of the cluster (replica-aware stubs), and will try to find a different server if the initial call fails. Whether or not an EJB can use a replica-aware EJBObject remote stub depends on the type of EJB and, for entity EJBs, the concurrency strategy (read only or read-write) selected at deployment time.

IBM's WebSphere provides a concept called EJB Workload Management. No specific deployment configuration changes are required for the EJBs to be clusterable, as long as the EJB client makes requests through the WLM Plug-In in the client application server.

WebSphere supports clustering of stateless session beans and the clustering of stateful session bean home objects among multiple application servers. However, it does provide clustering of a specific instance of a stateful session bean. Each instance of a specific stateful session bean can exist in only one application server, so once the bean instance is created, requests to the bean must be directed to that particular application server. For entity beans, the WebSphere EJB container supports three different entity bean caching options that define where the bean can be accessed (which servers) and when it is reloaded and passivated.

Be sure to take the time to understand the clustering capabilities of your container with respect to EJBs. If such clustering capabilities aren't available, don't cache the EJBHome handles in your client. Again, testing the failover capabilities of your server is important. You will want to make the changes that you think are needed, and then test failover to make sure there aren't any dependencies between servers.

Conclusion
Migrating from a single-server environment to a clustered environment may sound straightforward, but you need to enter such an endeavor with the appropriate expectations. Certainly, you need to understand the technical implications of using a cluster, including the configuration, administration, and software design changes that you need to make. However, you also need to make sure the migration project team has the right expectations, takes the time to test the clustering capabilities thoroughly, and doesn't convolute the project with changes to your application. If you prepare yourself with some of these steps, your migration effort will be much smoother.

More Stories By David Purcell

David Purcell has been a senior applications architect in the IT industry for nearly a decade, provided technical leadership on web application projects of all sizes. Mr. Purcell is currently Web Applications Supervisor for the Minnesota State Colleges and Universities system.

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
Cameron 12/09/04 08:27:38 PM EST

This is exactly why we originally introduced Coherence -- to handle the difficult parts of moving an application to a cluster, and having it provide (a) reliability and (b) scalability and (c) performance.