Wednesday, April 26, 2006
Sunday, April 23, 2006
SQL ordered by Executions for DB: XXX
|192,827||192,826||select 'x' from dual|
|54,644||54,644||INSERT INTO AUDIT ...|
We seem to get
select 'x' from dualissued exactly once as each connection is opened. This can represent a significant fraction of the executions. I raised this as an SR and was told:
This is a standard feature in the later JDBC libraries (10.1.0.x onwards) and is designed to ensure that the connection remains up and running - this is documented in the JDBC documentation ... This can be altered but not stopped by setting the min connections in the datasource to a minumum needed.
See BUG:4601037 - PINGDATABASE() PERFORMANCE CAN BE IMPROVED IN BOTH DRIVERS (against JDBC 10.1.x) however this is not published, and not fixed until JDBC 11.1.
Is it just me, or does it seem mad to have to test that your connection is up by sending an SQL statement? That's the first thing the app is going to do itself anyway; why not detect the failure then? In a well tuned app, the sessions should spend very little time 'on the shelf' when the app is busy; the database is just as likely to disappear while the session is being actively used as while it is resting. So this logic simply provides extra protection to the very first JDBC operation in a transaction; why not at least be able to tell it not to send this 'heartbeat' query unless the pooled session has been idle for longer than some (configurable) threshold.
Thursday, April 20, 2006
- 4280096 - HIGH BUFFER GETS FOR INSERT INTO LOB (fixed 188.8.131.52, 10.1.0.5 and 10.2)
- 3315549 - POOR PERFORMANCE WITH USING DBMS_LOB.WRITEAPPEND IN PLSQL LOOP
- 2602264 - DBMS_LOB.WRITEAPPEND() POOR PERFORMANCE WITH LARGE LOB AND SMALL CHUNK SIZE
- 2420525 - DBMS_LOB.WRITEAPPEND() PERFORMANCE ISSUES
- 2595083 - (description not published, although it's a base bug for some of the previous
But I'd never sat down and actually measured how much difference it can make - until today. My test appended a short string (around 35 characters) 10,000 times, using each of DBMS_LOB.append, DBMS_LOB.WriteAppend, and TEST.LOB_APPEND - a procedure that buffers the appended strings, and only flushes to the CLOB when the buffer is full:
for i in 1..:loopcount loop
Here are the results from tkprof, for 10000 iterations, 30 chars of padding:
Repeating 1,000,000 times, timing from SQL*Plus, and trying to help by wrapping the whole lot in
dbms_lob.append without open/close: 00:03:38.07
dbms_lob.append with open/close: 00:03:40.07
Simple dbms_lob.writeappend: 00:01:12.02
Append with buffering: 00:00:10.08
APPENDfor lazy reasons (no need to keep working out the length of the buffer, which
WRITEAPPENDneeds) - but the subsequent cast from VARCHAR2 to CLOB clearly contributes substantially to the problem.
In our case these LOBs are also being manipulated by many concurrent sessions on a RAC environment - so we've seen evidence of (space management) contention for the LOB segments. Reducing lio's seems to help more than a little... at least while we wait for our client to upgrade to the latest patch - which doesn't look likely any time soon!
Tuesday, April 18, 2006
You can work around this by setting
MAX_COMMIT_PROPAGATION_DELAYto zero. However that does impose extra load on the cluster.
The problem arises because your connect string (in the JDBC URL) specifies a RAC service, something like this:
This typically load-balances across all nodes in an annoyingly random and yet not entirely effective way - especially when there is a sudden increase in connections. Oracle’s default load balancing mechanism is particularly badly suited to a connection pool.
What you'd really like to do is to pin application sessions to database nodes - so that you can be (reasonably) sure that a given user, once logged on, will keep on using the same RAC node (node failure excepted, of course). Just have each app server connect to its 'own' database server.
Unfortunately, the JDBC URL configuration is clustered. Whatever you set it to has to work for all nodes. So we came up with a simple workaround:
- Turn load balancing off
- Add some new IP address mappings to each app server node's
- use the 'virtual' hostnames in the JDBC URL
Now you have this URL:
and in each
/etc/hostsyou have something like:
10.20.30.1 rac1.prt.stl.int rac1 db1
10.20.30.2 rac2.prt.stl.int rac2 db2
10.20.30.3 rac3.prt.stl.int rac3 db3
On each app server, the mappings from racN to dbN are modified, so that each app server "prefers" its own RAC server, forming a stovepipe.
If the app server tier is on nodes app1, app2 and app3, the mapping is:
|RAC node||app1 mapping||app2 mapping||app3 mapping|
So in normal operation, we have a stovepipe all the way down from web to app to database tiers. Ideally we would have the web load balancer intelligently cluster like users together (eg by organisation or region) to further reduce cluster overhead, but that's currently politically unacceptable. Another time, perhaps!
Friday, April 07, 2006
I especially like the O'Reilly title Fragile Web Development with SQL on Rails. Funny, I think that's what we must have been doing all along...
Sunday, April 02, 2006
We're building an application that - for the first year or two of operation - will include a legacy system component. Originally based on a kind of multiuser ISAM file system, the legacy data now lives in a series of over 100 Oracle schemas (one per site). The legacy app uses a legacy-to-relational bridge to access the data.
The bridge generates single table SQL to perform file-based actions like find-by-key, next record, previous record and so on. The tables are suitably indexed. However... the bridge (designed to support many different database platforms) issues SQL with no bind variables. So although there is a relatively small number of distinct access paths, each time you try one, it's got a new set of literal values in the predicates, and a hard parse is required.
The quick and easy solution is to have every bridge session start with:
ALTER SESSION SET CURSOR_SHARING='SIMILAR'
'Similar' statements share a parse tree, and the number of hard parses drops. In low volume everything works great. But as you crank up the workrate (we're using LoadRunner) the shared pool eventually gives up under the strain. Although there are relatively few 'families' of similar statements (ie statements which are exactly the same once all literals have been replaced by system generated bind variables), each single statement family may contain 100s of versions and use up to 150M of shared pool.
So the shared pool becomes fragmented and sooner or later it iss ORA-04031 time - unable to allocate memory. Even with 800M allocated to the shared pool, and 10% of it still available, the largest free chunk may be too small to be useful. We got down to a biggest chunk size of 23 bytes at one point.
A nasty side effect is that these errors result in everything slowing up. The main web application runs on clustered OC4J, using JDBC with session pooling to connect to a RAC back end. When queries start to slow down, the pool is exhausted so there's a sudden connection storm - which leads the entire system into a spiral of doom.
Now, the system architects aren't complete idiots. They had Oracle consultants on hand to remind them that bind variables are important. They even had access to the code base for the legacy bridge. So an attempt was made to add bind variable support.
It so nearly succeeded - but there was an itsy-bitsy bug in it.
The original bridge developer was asked to help solve the problem. And this is where he dug himself a big hole and jumped right in. A scientist proposes a theory, and conducts experiments that are designed to disprove it. This person simply stated what he believed:
Bind variables are used to keep certain info constant between calls. This
requires keeping and maintaining bind variable arrays. [...] To do it properly one must maintain a lot of stuff in separate arrays and relate them to the tables and types of calls being used. Rather than altering the session and letting Oracle bind them properly.
He (correctly, if sarcastically) describes what is happening:
I looked at your code briefly and your programmer brilliantly solved all the above problems. He creates a bind array, uses it in whatever the DB call is made, and then DESTROYS the array at the end of the call. It is not being used between calls AT ALL! Each DB call (in other words for every single record you read) creates, uses, then destroys the bind arrays.
And now he leaps to the wrong conclusion:
So technically you are binding, actually you have ADDED overhead and destroyed performance. [...] I am surprised you are not getting unexplained core dumps in a heavy locking situation where it accesses multiple files.
He assumed that:
- using binds in the legacy bridge would be expensive
- closing the OCI statement immediately after use would waste any savings
- using CURSOR_SHARING=SIMILAR in Oracle would (a) work and (b) be more efficient
The team lost heart in the face of this clear statement, and the attempt to use bind variables was abandoned. Attempts were made over many months to resolve the 4031 problem by throwing memory at it, or even ignoring it.
But in fact, his advice was 100% wrong. Here's why:
The cost to the legacy bridge of setting up and using bind variables is barely detectable because:
- Bind variables can easily be added into the statement as it is constructed
- the memory management is simple (malloc and free)
- few or no additional network roundtrips for bind data are needed - OCI piggy backs most or all of the information it needs onto the execution request
The Oracle server, on the other hand, is flat out. For it to avoid a hard parse it has to:
- lexically analyse every incoming SQL to identify whether and where literals have been used
- set up bind variables and extract the literals
- now do a soft or hard parse using the transformed SQL
- retain both forms of the SQL at least until the cursor is closed
- manage all the memory structures in a pool shared by hundreds or thousands of sessions
No contest. Although a small overhead is indeed added at the bridge, a major bottleneck is cleared on the RAC server.
Now - nearly a year later - a more experienced team has had another go. Using bind variables solved the 4031 problem pretty much at the first attempt - just as any old Oracle hand would expect. And that bug? It turned out to be a simple one line coding error, which took a couple of hours code review to find and fix.
The moral of the story is: don't believe dogma; actually try and prove or disprove a hypothesis properly.
And the answer to the question in the title: if you are continually issuing 'similar' SQL statements, it is worth using bind variables even if you discard the prepared statement after a single use.