Harald Scheckenbacher

Wednesday, February 13, 2013

Tomcat 7 - Clustering and Session Replication

Motivation

In today's world of cloud computing it becomes more and more important to abstract away the infrastructure. Much like the definition says for distributed systems:

The system shall appear to the user as if the user was communicating with a single system, never aware that they are in fact communicating with a distributed system.

Distribution is a decision that is made based on various factors, depending on what part of the system/ enterprise system is being looked at. I'll be examining that topic in a separate entry as I feel it would take over the core information that will be presented here.

Regardless what reason lead to the need for a cluster and choosing Tomcat as an application server, this post will describe the process and some pitfalls that I encountered while getting the system up and running.

The Beginning

Before I dive into the details I do want to ensure I in detail describe the environment in which I solved the problem. Thus making sure that technical readers can quickly identify whether this article will be helpful for them or not.

All software used is freely available, I added links to all listed products for convenience.

Environment

The environment chosen was:

Notes:

Red Hat Enterprise Server 6.3 was chosen due to it being the actual target environment, 32-bit because it was a VM running locally on my dev machine. I would recommend in general to skip Red Hat ES in favor of CentOS 6.3 (much like Fedora and openSUSE). Or of course to make everything a lot easier you can also just choose Ubuntu server, what ever you feel will get you from A to B faster or whatever you feel will be more beneficial to your business environment.
RHES 6.3 will not have a link to the enterprise yum repository unless registered, much like SuSE ES. However, if it is just for a POC (Proof Of Concept) it is sufficient to just install RPMs via yum - dependencies have to be resolved manually.
HTTPD 2.2 came with the system and contains all that's needed, out of the box it was delivered with mod_proxy and mod_proxy_balancer which we will be using for this example. - mod_jk is the most common module used to demonstrate load balancing so I'll stick to AJP 1.3 with mod_proxy_balancer

Preparation

Before you dive into the configuration part take a second and contemplate how you would like to deploy your instances. In this example I chose 2 instances which are both deployed in /srv/{service_name}#

So for example: /srv/simpleapp1 and /srv/simpleapp2

These will be the root folders for tomcat.

Ensure that:

Java is installed and that all sym links point to the right JDK - if not I would recommend using yum, yast, apt-get or whatever repo manager is native to your OS.
HTTPD 2.2 is installed and running - if not I would recommend using yum, yast, apt-get or whatever repo manager is native to your OS.

Ensure that mod_proxy and mod_proxy_balancer are loaded (check /etc/httpd/conf/httpd.conf and locate the block of LoadModule directives) - if not check to see whether they are available in (/usr/lib/httpd/modules) or simply use "sudo locate mod_proxy.so" (run "sudo updatedb" before hand if need be) or "sudo find / -name mod_proxy.so" (searches whole system)

Download Tomcat (*.tar.gz)
Extract Tomcat to the two directories previously setup

Open $tomcat_dir/conf/server.xml and locate the Connector elements for HTTP/1.1 and AJP/1.3 - ensure that they are different for both instances

Your network interface is multicast enabled

Execute "ifconfig -a" and locate your network interface, usually you want to look for eth#
Locate the line that reads "UP BROADCAST RUNNING MULTICAST" - different Linux/Unix flavors show some slight differences.
If not configured please consult the following links (hope one works for you):

Ensure that SELinux boolean "httpd_can_network_connect" is set to true. (This will be needed when httpd connects to tomcat workers)

Execute "getsebool -a | grep httpd*" and locate the setting (grep can be more detailed of course but this will show you all that pertain to httpd)
If it is set to (off|0|false) execute the following command: "sudo setsebool -P httpd_can_network_connect on" (in order use sudo you must be a sudoer, or you can execute this as root). The -P flag persists the setting to ensure a reboot will not reset the flag.

Some useful commands:
tar -xzf $name.tgz # Extract contents of tar into current directory
rm -rf $dir # remove non empty directory
mv $source $target # move source to target (equivalent to rename)
cp $source $target # copy source to target
chown -R $uid $dir # assign user $uid to be owner of directory (and all ancestors) $dir

Alright, that was a lot to double check - but it'll only have to be done once. I surely hope that you didn't run into the multicast issue, this can potentially be a little bit more time consuming.

Configuration

At this point in time the assumption is that httpd is up and running, that both tomcat instances can be started via the "/srv/{service_name}#/bin/startup.sh" scripts and run independently (meaning: they are not part of a cluster and httpd doesn't act as a load balancer yet).

The Load Balancer (httpd)

In order to setup Apache HTTPD to act as a long balancer the modules mod_proxy and mod_proxy_balancer must be loaded. It is assumed that this has been done.

To configure the load balancer simply locate httpd.conf, which on my system was located at /etc/httpd/conf (depending on flavor may be /etc/apache2/conf or something else). As mentioned before the commands:

locate httpd.conf

- OR -

find / -name httpd.conf

Will prove to be helpful if you are unable to locate to locate the file in the well known location.

Once you found the file, navigate to the section that starts with: <IfModule mod_proxy.c>

The following configuration was chosen by me:

<IfModule mod_proxy.c>
ProxyRequests Off
ProxyVia Off
ProxyPreserveHost On
ProxyPass / balancer://cluster/ stickysession=JSESSIONID|jsessionid nofailover=Off scolonpathdelim=On
<Proxy balancer://cluster>
BalancerMember ajp://127.0.0.1:8009 route=jvm1
BalancerMember ajp://127.0.0.1:8010 route=jvm2
</Proxy>
</IfModule>

For further description on the parameters ProxyRequests, ProxyVia and ProxyPreserveHost please click on the links.

The ProxyPass directive instructs httpd to forward all requests to the cluster - notice how this directive is not scoped, so httpd becomes a pass through. It is further defined that JSESSIONID or jsessionid will be used for maintaining a sticky session. A sticky session means that while the app server is alive all requests will be routed to it. This is a requirement for clustering with Tomcat, see also the Official Documentation.

The attribute nofailover is set to Off which instructs httpd to redirect to a different node if the original node is unavailable (crash). Due to session replication being active we don't have to worry about the user noticing the switch.
The attribute scolonpathdelim indicates that a semi colon is the path deliminator, tomcat uses a semi colon.

The Proxy element is scoped to the cluster - no worries "cluster" doesn't have to exist as an actual host, it is a place holder/target.

Each BalancerMember is communicated with via AJP/1.3 - keep note of the route key, this will make more sense once we look at what changes need to be made to the Tomcat server.xml. Accept for the moment that we point to workers jvm1 and jvm2 and that ports 8009 and 8010 will be the configured ports for the AJP/1.3 connector in Tomcat.

To add more Tomcat instances to your balancer you just repeat the pattern of course:

<Proxy balancer://cluster>
BalancerMember ajp://127.0.0.1:8009 route=jvm1
BalancerMember ajp://127.0.0.1:8010 route=jvm2
...
BalancerMember ajp://127.0.0.1:#### route=jvm#
</Proxy>

And we're done - that was easy right? Again, please keep in mind that this can also be done via mod_jk which is a more spread scenario.

The App Server (Tomcat Instance)

Instead of going into too much detail here let me be up front, the configuration as outlined in the Offlicial Documentation (For the impacient) is perfect.

You can follow that and it should work just fine, here are a few notes for you on the configuration:

1) VERY IMPORTANT: In order for Tomcat to consider session replication for a web app the web.xml has to contain the <distributable/> element as a child of the web-app element.

I used a maven archetype to generate web.xml and the XSD version was 2.4 - in order to Tomcat to pick up on the <distributable/> element at all you have to ensure the correct schema is referenced (3.0+):

<web-app xmlns="http://java.sun.com/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd"
version="3.0"
metadata-complete="true">
<distributable/>
...
</web-app>

You will notice when session replication is working, in that the DeltaManager will communicate that it is listening in catalina.out. Here something you'll have to look for:

Feb 13, 2013 12:38:47 PM org.apache.catalina.ha.session.DeltaManager startInternalINFO: Starting clustering manager at /zkFeb 13, 2013 12:38:47 PM org.apache.catalina.ha.session.DeltaManager getAllClusterSessionsINFO: Manager [/zk], requesting session state from org.apache.catalina.tribes.membership.MemberImpl[tcp://{127, 0, 0, 1}:4000,{127, 0, 0, 1},4000, alive=78673, securePort=-1, UDP Port=-1, id={-105 5 114 88 -6 112 71 -4 -73 21 -109 -68 -32 -119 -52 1 }, payload={}, command={}, domain={}, ]. This operation will timeout if no session state has been received within 60 seconds.Feb 13, 2013 12:38:48 PM org.apache.catalina.ha.session.DeltaManager waitForSendAllSessions

2) IMPORTANT: Make sure that the instances have the correct AJP/1.3 ports and worker names assigned!

The Engine element contains the worker name or jvmRoute:

<Engine name="Catalina" defaultHost="localhost" jvmRoute="jvm2">

The Connector for the AJP/1.3 protocol contains the port:

<Connector port="8010" protocol="AJP/1.3" redirectPort="9443" />

3) It is imperative to use org.apache.catalina.ha.session.DeltaManager - this is the most efficient manager for replication as it lowers the payload

4) For the Channel a receiver of type org.apache.catalina.tribes.transport.nio.NioReceiver is defined, the attribute address is set to "auto" which gave me some issues - I tied it to 127.0.0.1 instead - both my instances are on the same machine.

5) I read that there are some issues with org.apache.catalina.ha.deploy.FarmWarDeployer, in that the webapps directory must be configured to point to the webapp directory of the instance, for example: /srv/simpleapp1/webapps (if this is the server.xml for simpleapp1)

Once both instances are configured make sure to start them simply via their respective start scripts:

/srv/simpleapp#/bin/startup.sh

You can shut them down with the following commands (first one recommended)

/srv/simpleapp#/bin/shutdown.sh
kill $PID
kill -9 $PID

In order to obtain the $PID you can simply type

ps -eaf | grep java

Testing it all

Now that everything is up and running you can test the configuration, I would suggest to open some shells to get insight in what is happening behind the scenes and use tailf to get the updates (and to ensure that DeltaManager is initialized).

For example:

tailf /srv/simpleapp1/logs/catalina.out

I would also suggest to monitor the error.log for httpd which is usually located in /var/logs/httpd/

Now navigate to the httpd server (on port 80), you should be greeted with the Tomcat index page:

Tomcat Start Page

Now in order to test that session replication works simply use the example JSPs provided with Tomcat, for example:

http://{host}/examples/jsp/sessions/carts.html

Then shut down the node that you got "stuck" to and keep adding items to the cart - you'll notice that you can just continue your work without interruption.

As an additional test, bring the node you shut down back up, then shut down the node you got redirected to.

Credits And Acknowledgements

I consulted the following sources to provide the solution outlined in this post, I wanted to ensure that credit is given to all that have been putting themselves out there to spread solutions in order to help us all.

Tuesday, July 5, 2011

Clean Software

When I say clean software I'm of course talking about software that is:

Loosely coupled
Highly cohesive
Following standard software patterns
Easy to read
Easy to maintain
Dead code free

To be more specific about the points made above:

Loose coupling means that classes are nicely encapsulated, they don't expose their interior to the world, objects communicate through interfaces with an implementation that they don't know about, all they know about is the interface.

The advantage here is that the object only needs to know an objects interface but not its internals, the internals can change as much as they want in the objects class, as long as the class sticks to the interface contract.

Highly cohesive means that all objects in the system have a specific purpose that they fulfill. An object is responsible for a certain task, it knows how to do that specific task, it uses other objects to fulfill other tasks.

Following standard software patterns is good practice and often times takes a while to be engrained in ones mind. Patterns are well proven solutions to common problems, most software systems benefit from patterns and it's always worth at least having a tool set of patterns in the back of ones mind to fall back on. It is also a decent way to communicate the system to other developers as patterns are well known, so in case of an MVC (Model View Controller) pattern it is easy to identify the various parts that make up the application. It allows for abstraction of the problem and for a clean and to the point solution.

Software in my mind is considered easy to read if it follows standard patterns, is highly cohesive and follows standard programming guidelines. This is a view on the code itself, placement of comments to walk a reader through the code is advantageous, however, instead of many comments it is preferable to write readable code by using standard variable naming conventions, class naming conventions, follow the object oriented paradigm and avoid a lot of dead code.
Methods and Classes should contain javadoc comments, an interface should contain a comment for a method that it provides, the implementation should contain a comment specifically how it implements the method.
Classes should have a comment that outlines their purpose.

Easy to maintain and easy to read go hand in hand, however there are some other factors that need to be taken into consideration. For example when it comes to the deployment/development strategy, here some questions that you may want to ask:

Is it possible to deploy a sub system of classes without having to reinstall the software?
Is it possible to send out a JAR that patches the problem?
Is the code kept in a code versioning repository and is it used properly?
Is it easy to get the development environment up and running?
Is there a strategy to debug a live system?
Does the software contain proper logging?
Can we deploy software that analyses the system?

There are probably a lot more questions but having been in the situation where a customer faces a problem with a piece of software I never worked on the above are pretty much what goes through my mind.
Having a customer in a production stop situation means that a patch is required quickly, as time is money, and production halt means the system is in an unusable state.
Also one should not assume that it is always possible to access a customers system, lots of customers may have data that is restricted and can't be shown to the support engineer.
An analysis script is a good solution here as the engineer can extract information that might be useful and the results are provided by the customer (the customer actually runs it based on instructions given by support). This gives them ease of mind as they see what is being passed back.

The best solutions are hassle free ones for the customer, if a problem can be solved by the customer performing less than 4 steps and takes less than 5 minutes it is a solution that makes their life easier and thus yours.

Back to the point of maintainable code. Decent logging within the code as well as traceability through it (highly cohesive, loosley coupled) makes it easier to determine what the potential problem could be.

Another important aspect of making debugging and customer support more efficient is to throw the proper Exceptions - at least in languages that have them. This will often times contain information that the implementer felt would be important at the stage the problem occurs and logging the issue will allow the customer to send a log file.

The source maintenance as well as the development environment mentioned before support the engineer to tend to a customers problem faster, rather than wasting hours to get all the source code and more hours to set up an environment that allows for remote debugging.

Software should never contain dead code, a common practice for code that will no longer be supported is called depreciation, in Java a method can be marked as "Depreciated" which means that all API consumers should change their approach as the method will be removed in future releases. Note that depreciated code is NOT dead code! I wanted to list that as an example of code that might still be used by parts of the system but is on its way out.
Dead code refers to methods that have no caller or variables that are never used. All these things take up heap space, a dead local variable of a certain type will take up the types default space for each object created - that might not be a lot but it always helps to start saving memory on a small scale.

Considering the points mentioned above and keeping them in mind will help you to develop your own guidelines, coding standards and deployment strategies that will ultimately produce a clean software system. This system will be easy to maintain which is a big advantage for production systems that are deployed for multiple customers. The faster an engineer understands and can develop a solution for a system the better.

Note: A clean system is not equivalent to a bug free system. Clean code can have bugs - there are various types of bugs, for example a bug that may have fully functioning code might be a requirements miss. The software behaves as implemented and intended but product management miss-communicated a requirement and the code should be doing something else.

Sunday, October 3, 2010

Welcome

This is my very first post to this blog, I plan on using it for sharing ideas, solutions and philosophy related to software engineering, software design as well as specific frameworks. This will be Java based as it is my programming language of choice - that said, design is neutral, design can be implemented in a variety of languages.

If you are a developer yourself and happen to work with related technologies that I might find interesting don't hesitate to invite me to follow you.

That's it for now, looking forward to knowledge sharing.