Tuesday, April 18, 2006

"Micro-partitioning" pooled sessions in a RAC environment

One of the problems we've come across testing an OC4J based system against an Oracle 9.2.0.5 RAC database is that as a given application session borrows database sessions from the session pool, it finds itself skipping at high speed around all the nodes in the RAC. If you're not careful, it's possible to insert or update some piece of data (on one database node) and then find that it's not quite there when you take a look a millisecond later via a different node.


You can work around this by setting MAX_COMMIT_PROPAGATION_DELAY to zero. However that does impose extra load on the cluster.


The problem arises because your connect string (in the JDBC URL) specifies a RAC service, something like this:

url="jdbc:oracle:thin:@(description=(LOAD_BALANCE=on)(address=(protocol=tcp)(host=rac1)(port=1521))(address=(protocol=tcp)(host=rac2)(port=1521))(address=(protocol=tcp)(host=rac3)(port=1521))(connect_data=(service_name=RACDB)))"


This typically load-balances across all nodes in an annoyingly random and yet not entirely effective way - especially when there is a sudden increase in connections. Oracle’s default load balancing mechanism is particularly badly suited to a connection pool.


What you'd really like to do is to pin application sessions to database nodes - so that you can be (reasonably) sure that a given user, once logged on, will keep on using the same RAC node (node failure excepted, of course). Just have each app server connect to its 'own' database server.


Unfortunately, the JDBC URL configuration is clustered. Whatever you set it to has to work for all nodes. So we came up with a simple workaround:
  • Turn load balancing off

  • Add some new IP address mappings to each app server node's /etc/hosts file

  • use the 'virtual' hostnames in the JDBC URL


Now you have this URL:


url="jdbc:oracle:thin:@(description=(LOAD_BALANCE=off)(address=(protocol=tcp)(host=db1)(port=1521))(address=(protocol=tcp)(host=db2)(port=1521))(address=(protocol=tcp)(host=db3)(port=1521))(connect_data=(service_name=RACDB)))"


and in each /etc/hosts you have something like:


10.20.30.1 rac1.prt.stl.int rac1 db1

10.20.30.2 rac2.prt.stl.int rac2 db2

10.20.30.3 rac3.prt.stl.int rac3 db3



On each app server, the mappings from racN to dbN are modified, so that each app server "prefers" its own RAC server, forming a stovepipe.
If the app server tier is on nodes app1, app2 and app3, the mapping is:





RAC node app1 mappingapp2 mappingapp3 mapping
rac1db1db2db3
rac2db2db3db1
rac3db3db1db2

So in normal operation, we have a stovepipe all the way down from web to app to database tiers. Ideally we would have the web load balancer intelligently cluster like users together (eg by organisation or region) to further reduce cluster overhead, but that's currently politically unacceptable. Another time, perhaps!


Cheers Nigel

4 comments:

Kumar Madduri said...

Hi Nigel
Is this a required step
Add some new IP address mappings to each app server node's /etc/hosts file

Cant I use rac1, rac2, rac3 with the way they are and just turn the load_balancing=off in the jdbc connection url. Would that not work?

Nigel said...

Kumar

The /etc/hosts trick makes sure that when there are M app servers, each one (ie appN) normally has sessions in the corresponding racN. So the load balancer divides load, and then each appN/racN are closely coupled (unless racN dies, in which case connections will be made in rac(mod(N+1,M)).

Otherwise all app servers would send all traffic to rac1 - so in normal operation rac2, rac3 etc would be idle.

NB if the client configuration is not clustered, then you could simply have different JDBC (or tnsnames) config on each client, and you wouldn't need this trick.

I hope that's clearer.

BTW I assume you came here via today's post on Oracle-L - are you working with Hrishy?


Regards Nigel

Kumar Madduri said...

ok. got it. So you are trying to define (virtual host, app tier) combination and trying to bind the connections in that combination under normal circumstances. But if a node fails then it is stable enough to failover to the next node although one node may be overloaded.

I dont know Hrishy. I got this lead from oracle -l

Nigel said...

The thread I mentioned csn be found here on the Oracle-L mail list for anyone interested in the future.

Regards Nigel