Wednesday, February 13, 2013

Tomcat 7 - Clustering and Session Replication

Motivation

In today's world of cloud computing it becomes more and more important to abstract away the infrastructure. Much like the definition says for distributed systems:

The system shall appear to the user as if the user was communicating with a single system, never aware that they are in fact communicating with a distributed system.

Distribution is a decision that is made based on various factors, depending on what part of the system/ enterprise system is being looked at. I'll be examining that topic in a separate entry as I feel it would take over the core information that will be presented here.

Regardless what reason lead to the need for a cluster and choosing Tomcat as an application server, this post will describe the process and some pitfalls that I encountered while getting the system up and running.

The Beginning

Before I dive into the details I do want to ensure I in detail describe the environment in which I solved the problem. Thus making sure that technical readers can quickly identify whether this article will be helpful for them or not.

All software used is freely available, I added links to all listed products for convenience.

Environment

The environment chosen was:
Notes: 
  • Red Hat Enterprise Server 6.3 was chosen due to it being the actual target environment, 32-bit because it was a VM running locally on my dev machine. I would recommend in general to skip Red Hat ES in favor of CentOS 6.3 (much like Fedora and openSUSE). Or of course to make everything a lot easier you can also just choose Ubuntu server, what ever you feel will get you from A to B faster or whatever you feel will be more beneficial to your business environment.
  • RHES 6.3 will not have a link to the enterprise yum repository unless registered, much like SuSE ES. However, if it is just for a POC (Proof Of Concept) it is sufficient to just install RPMs via yum - dependencies have to be resolved manually.
  • HTTPD 2.2 came with the system and contains all that's needed, out of the box it was delivered with mod_proxy and mod_proxy_balancer which we will be using for this example. - mod_jk is the most common module used to demonstrate load balancing so I'll stick to AJP 1.3 with mod_proxy_balancer

Preparation

Before you dive into the configuration part take a second and contemplate how you would like to deploy your instances. In this example I chose 2 instances which are both deployed in /srv/{service_name}#

So for example: /srv/simpleapp1 and /srv/simpleapp2

These will be the root folders for tomcat.

Ensure that:
  1. Java is installed and that all sym links point to the right JDK - if not I would recommend using yum, yast, apt-get or whatever repo manager is native to your OS.
  2. HTTPD 2.2 is installed and running - if not I would recommend using yum, yast, apt-get or whatever repo manager is native to your OS.
    1. Ensure that mod_proxy and mod_proxy_balancer are loaded (check /etc/httpd/conf/httpd.conf and locate the block of LoadModule directives) - if not check to see whether they are available in (/usr/lib/httpd/modules) or simply use "sudo locate mod_proxy.so" (run "sudo updatedb" before hand if need be) or "sudo find / -name mod_proxy.so" (searches whole system)
  3. Download Tomcat (*.tar.gz)
  4. Extract Tomcat to the two directories previously setup
    1. Open $tomcat_dir/conf/server.xml and locate the Connector elements for HTTP/1.1 and AJP/1.3 - ensure that they are different for both instances
  5. Your network interface is multicast enabled
    1. Execute "ifconfig -a" and locate your network interface, usually you want to look for eth#
    2. Locate the line that reads "UP BROADCAST RUNNING MULTICAST" - different Linux/Unix flavors show some slight differences.
    3. If not configured please consult the following links (hope one works for you):
      1. http://blogs.agilefaqs.com/2009/11/08/enabling-multicast-on-your-macos-unix/
      2. http://www.ibiblio.org/pub/Linux/docs/HOWTO/other-formats/html_single/Multicast-HOWTO.html
  6. Ensure that SELinux boolean "httpd_can_network_connect" is set to true. (This will be needed when httpd connects to tomcat workers)
    1. Execute "getsebool -a | grep httpd*" and locate the setting (grep can be more detailed of course but this will show you all that pertain to httpd)
    2. If it is set to (off|0|false) execute the following command: "sudo setsebool -P httpd_can_network_connect on" (in order use sudo you must be a sudoer, or you can execute this as root). The -P flag persists the setting to ensure a reboot will not reset the flag.
Some useful commands:
tar -xzf $name.tgz  # Extract contents of tar into current directory
rm -rf $dir # remove non empty directory
mv $source $target  # move source to target (equivalent to rename)
cp $source $target  # copy source to target
chown -R $uid $dir # assign user $uid to be owner of directory (and all ancestors) $dir

Alright, that was a lot to double check - but it'll only have to be done once. I surely hope that you didn't run into the multicast issue, this can potentially be a little bit more time consuming.

Configuration

At this point in time the assumption is that httpd is up and running, that both tomcat instances can be started via the "/srv/{service_name}#/bin/startup.sh" scripts and run independently (meaning: they are not part of a cluster and httpd doesn't act as a load balancer yet).

The Load Balancer (httpd)

#
In order to setup Apache HTTPD to act as a long balancer the modules mod_proxy and mod_proxy_balancer must be loaded. It is assumed that this has been done.

To configure the load balancer simply locate httpd.conf, which on my system was located at /etc/httpd/conf (depending on flavor may be /etc/apache2/conf or something else). As mentioned before the commands:
locate httpd.conf
- OR - 
find / -name httpd.conf

Will prove to be helpful if you are unable to locate to locate the file in the well known location.

Once you found the file, navigate to the section that starts with: <IfModule mod_proxy.c>

The following configuration was chosen by me:
<IfModule mod_proxy.c>
  ProxyRequests Off
  ProxyVia Off
  ProxyPreserveHost On
  ProxyPass / balancer://cluster/ stickysession=JSESSIONID|jsessionid nofailover=Off scolonpathdelim=On
  <Proxy balancer://cluster>
    BalancerMember ajp://127.0.0.1:8009 route=jvm1
    BalancerMember ajp://127.0.0.1:8010 route=jvm2
  </Proxy>
</IfModule>
For further description on the parameters ProxyRequests, ProxyVia and ProxyPreserveHost please click on the links.

The ProxyPass directive instructs httpd to forward all requests to the cluster - notice how this directive is not scoped, so httpd becomes a pass through. It is further defined that JSESSIONID or jsessionid will be used for maintaining a sticky session. A sticky session means that while the app server is alive all requests will be routed to it. This is a requirement for clustering with Tomcat, see also the Official Documentation.

  • The attribute nofailover is set to Off which instructs httpd to redirect to a different node if the original node is unavailable (crash). Due to session replication being active we don't have to worry about the user noticing the switch.
  • The attribute scolonpathdelim indicates that a semi colon is the path deliminator, tomcat uses a semi colon.
The Proxy element is scoped to the cluster - no worries "cluster" doesn't have to exist as an actual host, it is a place holder/target.

Each BalancerMember is communicated with via AJP/1.3 - keep note of the route key, this will make more sense once we look at what changes need to be made to the Tomcat server.xml. Accept for the moment that we point to workers jvm1 and jvm2 and that ports 8009 and 8010 will be the configured ports for the AJP/1.3 connector in Tomcat.

To add more Tomcat instances to your balancer you just repeat the pattern of course:
<Proxy balancer://cluster>
BalancerMember ajp://127.0.0.1:8009 route=jvm1
BalancerMember ajp://127.0.0.1:8010 route=jvm2
...
BalancerMember ajp://127.0.0.1:#### route=jvm#
</Proxy>
And we're done - that was easy right? Again, please keep in mind that this can also be done via mod_jk which is a more spread scenario.

The App Server (Tomcat Instance)

Instead of going into too much detail here let me be up front, the configuration as outlined in the Offlicial Documentation (For the impacient) is perfect.

You can follow that and it should work just fine, here are a few notes for you on the configuration:

1) VERY IMPORTANT: In order for Tomcat to consider session replication for a web app the web.xml has to contain the <distributable/> element as a child of the web-app element.
I used a maven archetype to generate web.xml and the XSD version was 2.4 - in order to Tomcat to pick up on the <distributable/> element at all you have to ensure the correct schema is referenced (3.0+):
<web-app xmlns="http://java.sun.com/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd"
version="3.0"
metadata-complete="true">
  <distributable/>
  ...
</web-app>
You will notice when session replication is working, in that the DeltaManager will communicate that it is listening in catalina.out. Here something you'll have to look for:
Feb 13, 2013 12:38:47 PM org.apache.catalina.ha.session.DeltaManager startInternalINFO: Starting clustering manager at /zkFeb 13, 2013 12:38:47 PM org.apache.catalina.ha.session.DeltaManager getAllClusterSessionsINFO: Manager [/zk], requesting session state from org.apache.catalina.tribes.membership.MemberImpl[tcp://{127, 0, 0, 1}:4000,{127, 0, 0, 1},4000, alive=78673, securePort=-1, UDP Port=-1, id={-105 5 114 88 -6 112 71 -4 -73 21 -109 -68 -32 -119 -52 1 }, payload={}, command={}, domain={}, ]. This operation will timeout if no session state has been received within 60 seconds.Feb 13, 2013 12:38:48 PM org.apache.catalina.ha.session.DeltaManager waitForSendAllSessions


2) IMPORTANT: Make sure that the instances have the correct AJP/1.3 ports and worker names assigned!

The Engine element contains the worker name or jvmRoute:
<Engine name="Catalina" defaultHost="localhost" jvmRoute="jvm2"> 
The Connector for the AJP/1.3 protocol contains the port:
<Connector port="8010" protocol="AJP/1.3" redirectPort="9443" />
3) It is imperative to use org.apache.catalina.ha.session.DeltaManager - this is the most efficient manager for replication as it lowers the payload

4) For the Channel a receiver of type org.apache.catalina.tribes.transport.nio.NioReceiver is defined, the attribute address is set to "auto" which gave me some issues - I tied it to 127.0.0.1 instead - both my instances are on the same machine.

5) I read that there are some issues with org.apache.catalina.ha.deploy.FarmWarDeployer, in that the webapps directory must be configured to point to the webapp directory of the instance, for example: /srv/simpleapp1/webapps (if this is the server.xml for simpleapp1)

Once both instances are configured make sure to start them simply via their respective start scripts:
/srv/simpleapp#/bin/startup.sh
You can shut them down with the following commands (first one recommended)
/srv/simpleapp#/bin/shutdown.sh
kill $PID
kill -9 $PID
In order to obtain the $PID you can simply type
ps -eaf | grep java

Testing it all

Now that everything is up and running you can test the configuration, I would suggest to open some shells to get insight in what is happening behind the scenes and use tailf to get the updates (and to ensure that DeltaManager is initialized).

For example:
tailf /srv/simpleapp1/logs/catalina.out
I would also suggest to monitor the error.log for httpd which is usually located in /var/logs/httpd/

Now navigate to the httpd server (on port 80), you should be greeted with the Tomcat index page:


Tomcat Start Page

Now in order to test that session replication works simply use the example JSPs provided with Tomcat, for example: 

http://{host}/examples/jsp/sessions/carts.html

Then shut down the node that you got "stuck" to and  keep adding items to the cart - you'll notice that you can just continue your work without interruption.

As an additional test, bring the node you shut down back up, then shut down the node you got redirected to.

Credits And Acknowledgements

I consulted the following sources to provide the solution outlined in this post, I wanted to ensure that credit is given to all that have been putting themselves out there to spread solutions in order to help us all.

No comments:

Post a Comment