Preferisco - thoughts about Oracle, EAI/ETL and more: 2006

Wednesday, December 13, 2006

Would you pay $99,000 for a starter pack?

WebMethods has launched a starter pack for Infravio X-Registry, which combines a UDDI registry, a JAXR repository, and what they coyly describe as integrated governance [which] ensures that these policies can be effectively applied across design-time, run-time and change-time environments.

For the bargain basement price of just $99,000 you get to minimize the upfront cost of initial adoption. Or to put it another way, you get 2 cpus, 5 named users and a UDDI registry that will allow you to define just 25 web services.

Why on earth would anyone want to pay that much to adopt a vendor-ized version of open standard technologies? Now if they had said this was a free community edition, they might be in with a chance of luring unsuspecting punters in and hitting them with the upgrade later. But this way round? Seems daft to me...

Tuesday, December 12, 2006

Informatica buys Itemfield, gets a sprinkle of Gartner's magic dust

Informatica has agreed to buy Itemfield - an Isreali outfit specialising in integration of unstructured and semi-structured data. That adds a useful extra string to their bow.

Coincidentally Gartner's latest Magic Quadrant for Data Integration Tools has recently materialised. Informatica gets a positive write up - its stands alone with IBM in the leaders quadrant, head and shoulders above the pack (including Oracle).

Sadly my former Constellar colleagues at DataMirror don't even get on the map - DM just gets a rather dismissive one line mention for Transformation Server in the also-rans section. Is DataMirror under-marketing its product range - including Constellar Hub, or is Gartner just not too impressed with it?

Friday, December 08, 2006

Problems with Statspack

Everyone's talking about it - statspack isn't perfect. Driven from Jonathan Lewis to Doug Burns to Daniel Fink's OptimalDBA I thought it was worth pointing out that as a snapshot technique, your knowledge of the value of the increment is uncertain.

Suppose snapshot N of metric has value s(N)
lets call the increment N to N+1 x(N)

All we can say for sure is:
A) if f(N+1) <>= f(N+1)
B) if f(N+1) >= f(N) then x(N) >= f(N+1) - f(N)

In normal use (when the shared pool is stable), we only see the case B, and we replace the >= with =. But if your shared pool is being trashed (see earlier posts) then remember uncertainty, and consider building case (A) into your reports.

Sunday, October 15, 2006

Vitria goes private

Once one of the great hopes of the EAI boom of the late 90s, Vitria has struggled over the last five years. Now I just noticed that they recently announced that they are going private.

The purchasers are Vitria founders Dale Skeen (current CEO) and his wife Jomei Chang (founding president and CEO). When Reuters bought Teknekron (which soon morphed into Tibco) they used their $10m to start up Vitria. As this 2001 story explains, Vitria's 1999 float was one of the more ebullient - priced at $16, hitting $273 just three months later.

Vitria was perhaps the first EAI company really to 'get' the idea of process level (rather than data) integration. Perhaps their biggest problem has been that their proprietary approach had hardly started to gain acceptance by the time industry standards - web services, service oriented architecture, BPEL and the rest - pulled the rug from under their feet.

Adjusted for stock splits, a peak price of $400 compares to a sale price of $2.75 per share. It will be very interesting to see what Chang and Skeen manage to make of the rump of their creation.

Monday, September 25, 2006

Loading iostat output into Oracle

Alex Gorbachev posted an item on Pythian's blog last week about Basic I/O monitoring on Linux.

That's reminded me to post a couple of scripts here that can be used to load cpu and disk stats from iostat into Oracle tables.

I assume your iostat command line looks something like this:


iostat -x -t [interval] > iostat.log

You can then process iostat.log using this script to make it more SQL*Loader friendly:


cat iostat.log | processiostat > iostat.dat

And here's the processiostat script:


#!/bin/sh
#
# read and process an iostat.out file
# assumption is that it has been collected using -x -t
#
# Format looks like this (excluding leading comment sign
#=======================
#Linux 2.4.21-27.ELsmp (myserver.preferisco.com)        02/05/06
#
#Time: 13:01:58
#avg-cpu:  %user   %nice    %sys %iowait   %idle
#           0.96    0.06    0.49    1.93   96.55
#
#Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
#sda         80.85   0.00 22.62 29.90  388.10  522.72   194.05   261.36    17.34     0.00    0.08   0.02   0.11
#sde         44.04   0.00 12.19  1.78  203.58   14.28   101.79     7.14    15.59     0.01    0.61   0.33   0.46
#sdi         44.41   0.00 11.68  1.87  195.50   15.71    97.75     7.86    15.59     0.01    0.79   0.45   0.61
#
#Time: ...
#=======================
#
# Read the file from stdin
read os version longhost date
# extract hostname - works for rac2-4 at least
# discard empty line
read discard
while read label time
do
    # discard cpu headings
   read discard
   # read cpu figures
   read usercpu nicecpu syscpu iowait idle
   echo CPU ${date}-$time $longhost $usercpu $nicecpu $syscpu $iowait $idle
   # discard blank line
   read discard
   # discard headings
   read discard
   # now read in device stats
   device="START"
   until [ "$device" = "" ]
   do
      read device rrqms wrqms rps wps rsps wsps rkbs wkbs rqsz qusz await svctm util
      if [ "$device" != "" ] 
      then
         echo DISK ${date}-$time $longhost $device $rrqms $wrqms $rps $wps $rsps $wsps $rkbs $wkbs $rqsz $qusz $await $svctm $util
      fi
   done
done

That .dat output can then be loaded using this control file:


-- general ctl format for iostat tables
load data
infile *
into table IOSTAT_CPU REPLACE
WHEN (1:3)="CPU"
fields terminated by whitespace
( TSTAMP position (5) DATE "mm/dd/yy-hh24:mi:ss" 
, HOST
, USERCPU
, NICECPU
, SYSCPU
, IOWAIT
, IDLE
)
into table IOSTAT_DISK REPLACE
WHEN (1:4)="DISK"
fields terminated by whitespace
( TSTAMP position(6) DATE "mm/dd/yy-hh24:mi:ss" 
, HOST
, DEVICE
, RRQMS
, WRQMS
, RPS
, WPS
, RSPS
, WSPS
, RKBS
, WKBS
, RQSZ
, QUSZ
, AWAIT
, SVCTM
, UTIL
)

Then you can do comparisons, trend analysis, even draw pretty graphs using your favourite GUI (SQL*Plus in my case...)

Enjoy...

Tuesday, August 08, 2006

Back in the saddle

After five weeks off (two of them in Italy) I'm finally starting a new contract tomorrow.

Good news
- it's not on an industrial estate behind the sugar factory in Bury St Edmunds, with a visit once/day from a sandwich lady
- the office is on a modern business park with four Starbucks, a restaurant and (not coincidentally) a mobile phone shop

Less good news
- it's 100 miles away, so I'll be spending a deal of time in the car, and/or in B&Bs for the next few months.

At least it should be interesting work with an enthusiastic team (from those I've met so far) - and I'll be well placed for training into London to meet friends and colleagues. And in case it all gets too much, I've got another two weeks in Italy starting a week today with a couple of days in Venice, then back to the house in Le Marche for a rest. It's a hard life...

Thursday, July 06, 2006

Using global context variables

I recently stumbled on another 'new' feature - though in fact it was introduced way back in Oracle 9i. A questioner on the OTN database forum wanted to know how to invalidate cached package state derived from a table that might be updated from another session.

My first suggestion was to implement a time-to-live cache - treat the data as stale if it has been cached for longer than X seconds (using DBMS_UTILITY.get_time to record the load time). All sessions manage their own private caches; a cache can be inconsistent for up to X seconds (if the underlying data is updated immediately after the cache is loaded).

Vetaran OTN poster Jens Petersen came up with what could be a much neater solution, which is to use a global context. You can define your own namespace for use with the standard SYS_CONTEXT function, and arrange to get/set attribute name/value pairs - on a per-session or instance-global basis. Like a DIRECTORY, a CONTEXT isn't owned by a specific schema.

Creating the context

To be able to create a context the user must be granted CREATE ANY CONTEXT privilege.


create or replace context my_context
using testuser.my_context_pkg
accessed globally -- including this makes attribute values visible to all sessions
/

A package is linked to the context; this is the only interface that can be used to set attribute values in the context. This coupling is a little inelegant, but it does mean that only users with EXECUTE permission on the package will be able to set values.


create or replace package my_context_pkg
as
procedure set_context
(av_attribute in VARCHAR2
,av_value in VARCHAR2
);
end my_context_pkg;
/

create or replace package body my_context_pkg
as
procedure set_context
(av_attribute in VARCHAR2
,av_value in VARCHAR2
)
is
begin
DBMS_SESSION.SET_CONTEXT
( namespace=>'MY_CONTEXT'
, attribute=>av_attribute
, value=>av_value
);
end set_context;

end my_context_pkg;
/

Any API you like to define is supported; this example is the simplest, allowing a completely unconstrained setting of attributes pairs. Here's how you define and use an attribute:


my_context_pkg.set_context('Attribute Name', 'Attribute Value');

Now, what about that original problem - how to notify active sessions that data in their package state might be out of date?

Well, there are two approaches: replace the cache directly with (one or more) CONTEXTs, or construct a dirty cache notification scheme:

Replace cache with CONTEXT

Simply maintain the cache as attribute/values in a CONTEXT. This could be appropriate if the cache was relatively simple (eg a set of lookup codes and descriptions). The PL/SQL caching can be completely discarded; instead use SYS_CONTEXT to get the descriptions, and set up triggers on the lookup table that use MY_CONTEXT_PKG to update the context. The only minor snag is that the context values need to be set (at least once) on instance startup; this could be a scheduled job, or be incorporated into a logon trigger.

Dirty cache notification

Define a 'cache update sequence' context attribute; let's suppose it uses MY_CONTEXT with an attribute name of 'cacheseq'.

When the package initialises its cache, it should read and retain the latest value for 'cacheseq' (treating null as 0). Later, before using values from the cache, it should re-read the latest value for 'cacheseq'. If the numeric value is greater than the retained value, the cache must be dirty. Applications that update the underlying tables (and so invalidate any caches) must increment 'cacheseq' - normally by using a statement level trigger on insert/update/delete.
my_context_pkg.set_context ( 'cacheseq' , nvl(to_number(SYS_CONTEXT('MY_CONTEXT','cacheseq')),0)+1 );
Updating a CONTEXT attribute is not transactional, and almost certainly not read consistent; if an update is rolled back, we may find our cache being refreshed unnecessarily. If two updates to the table are made concurrently, we might see 'cacheseq' increase by one only. But our dependent sessions should all see that there has been some update activity.

Performance

Before jumping in and using CONTEXTs, I wanted to be sure that the performance wouldn't be a limiting factor. I didn't want to find SYS_CONTEXT calls producing a hidden SELECT ... FROM DUAL as we all remember for SYSDATE, USER etc in the past. So here's a quick test:


declare
li_cached_integer integer := 0;
begin
for i in 1..100000 loop
li_cached_integer := nvl(to_number(sys_context('MY_CONTEXT','cacheseq')),0);
my_context_pkg.set_context('cacheseq',to_char(li_cached_integer+1));
end loop;
end;

Fetching the value 100,000 times (including char to number conversion) takes around 1 second; saving the modified value 100,000 times takes an additional 2 seconds (on an Intel Pentium 4 3.4Gb dual core running XP SP2 and Oracle XE).

Exporting Context Definitions

IMP/EXP won't import/export these objects, and I presume datapump won't either; so the easiest thing to do is create a SQL script using DBMS_METADATA package:


select DBMS_METADATA.GET_DDL(object_type, object_name)
from DBA_OBJECTS WHERE OBJECT_TYPE='CONTEXT';

Don't specify a schema in the call to GET_DDL, if you do, you get:


ORA-31604: invalid NAME parameter "SCHEMA" for object type CONTEXT in function SET_FILTER

Dictionary Views

There are some dictionary views that can be useful:

ALL_CONTEXT	lists contexts that have been activated (ie a value has been set since startup)
DBA_OBJECTS	(where object_type = 'CONTEXT')
V$CONTEXT	lists attributes set in the current session (appears to ignore globally accessible attributes)

I'm planning to come back to this topic once I've had a longer play with multiple sessions setting and reading locally and globally accessible context variables.

Friday, June 30, 2006

Parting is such sweet sorrow

Shakespeare knew a thing or two about emotions, that's for sure.

Today my contract finally ends after two and a half years of repeated 3 month and 6 month extensions. There's so much I can quarrel with about how the project was architected and how it has been managed. But today I just feel glad to have been able to help, and sorry that I'll miss the next few steps.

All in all it has been a lot of fun. Thanks to everyone involved, here's hoping that the next job turns up soon (after a break at the farmouse in Italy of course) and that we can learn as much from each other as I have here.

Thursday, June 29, 2006

Anything 2.0

Doesn't it just make your head hurt when you see yet another news item, press release, blog item about XYZ 2.0?

Publishers O'Reilly claims authorship of the term Web 2.0, and for sure it has more than succeeded in its original purpose as a call to action. But (as the article mentions) the Web 2.0 (and anything 2.0) moniker has been hijacked by just about every tech PR company still up and running since the dot com shakeout (Wikipedia even defines Marketing 2.0). I started to foam at the mouth when I saw the April 2006 cover of UK venture capitalist 3i's iSIGHT magazine, which managed to cram in Trade 2.0, Business Web 2.0, Consumer 2.0 and Economy 2.0 (to round it off, they had yet another one of those hagiographic interviews with Tim Berners Lee in the back. Semantic, shemantic...).

When I see 2.0 I see a product at a point in time - and a pretty early point. Who remembers Oracle 2.0 (well, I do, but only just)? Windows 2.0? We all know that it's only from version 3 or above that a product really starts to earn its keep. To the extent that it is useful at all, "Web 2.0" is just a shorthand for certain aspects of recent developments on the web (interactivity? community? Flikr? AJAX? mash-ups? folksonomies? As the Queen says 'it means whatever I want it to mean').

Now (27 June 2006) CMP Media has actually managed to register Web 2.0 as a service mark in the US. Is there anything you can't sneak past the US Patent and Trade Office? Even though they issued a 'Final Refusal' in November 2004, over the following 18 months they evidently caved in to appeals from CMP.

I'd like to see a boycott of this lazy habit of shoving 2.0 after anything and everything (except your competitors - they're all 1.0 of course). Laughter is best; there's nothing succeeds in deflating the pretentious journo-marketeer as well as public ridicule - so please join me by posting any particularly ludicrous 2.0 claims here!

Monday, May 15, 2006

Peter Gyenes joins WebMethods

Interesting move; from 1996 Peter Gyenes ran Vmark which became Ardent, then sold it to Informix in 2000 and a year later helped sell the Informix database business to IBM - leaving him to run the ETL rump which was renamed Ascential.

Last year Ascential itself was sold to IBM (see IBM Purchases Ascential Software".

And now finally he has joined the board at WebMethods. Well, he should be able to teach them a trick or two about M&A. WEBM in play perhaps?

Saturday, May 06, 2006

The real cost of denormalisation

Perhaps the most common performance problem I used to come across in my Oracle consultancy days was inappropriate, excessive or just plain dumb denormalisation. Any number of inexperienced (or wrongly experienced) developers would proudly show me how they had 'optimised' their database design because joins are expensive, aren't they.

Ironically, this design decision would often be the direct cause of my visit...

Anyway, while browsing the always entertaining Oracle-L mail list yesterday, I tripped over Jared Still's Normalize for Performance paper - given at Hotsos 2006 a few weeks ago. He's explained some of the typical problems (and also ways around some of them with materialised views). I'll enjoy going through the examples next time I've got a couple of free hours...

Wednesday, April 26, 2006

Informatica keeps it up

Good to see Informatica reporting a good quarter. Some of you will remember Sohaib Abbasi, Informatica's CEO, from the good ole days (late '80s and certainly well into the 90s) when he was in charge of the Tools (ie Forms and Reports) group at Oracle. Looks like he's doing a fair job of shepherding the company - one of the very few 20th century ETL/EAI outfits still surviving as an independent.

Sunday, April 23, 2006

Is this an Oracle WTF?

Using JDBC with connection pooling, we see this in our statspack (over a 15 minute period, on one of a 3 node RAC setup:

SQL ordered by Executions for DB: XXX

Executions	Rows Processed	Statement
192,827	192,826	select 'x' from dual
54,644	54,644	INSERT INTO AUDIT ...

We seem to get select 'x' from dual issued exactly once as each connection is opened. This can represent a significant fraction of the executions. I raised this as an SR and was told:

This is a standard feature in the later JDBC libraries (10.1.0.x onwards) and is designed to ensure that the connection remains up and running - this is documented in the JDBC documentation ... This can be altered but not stopped by setting the min connections in the datasource to a minumum needed.

See BUG:4601037 - PINGDATABASE() PERFORMANCE CAN BE IMPROVED IN BOTH DRIVERS (against JDBC 10.1.x) however this is not published, and not fixed until JDBC 11.1.

Is it just me, or does it seem mad to have to test that your connection is up by sending an SQL statement? That's the first thing the app is going to do itself anyway; why not detect the failure then? In a well tuned app, the sessions should spend very little time 'on the shelf' when the app is busy; the database is just as likely to disappear while the session is being actively used as while it is resting. So this logic simply provides extra protection to the very first JDBC operation in a transaction; why not at least be able to tell it not to send this 'heartbeat' query unless the pooled session has been idle for longer than some (configurable) threshold.

Cheers Nigel

Thursday, April 20, 2006

DBMS_LOB.APPEND eats resources on 9.2.0.5

We're generating and storing XML data into CLOB columns as part of our reporting solution. We've been aware for a while that there are some performance issues with LOBs in 9.2.0.5, eg:

4280096 - HIGH BUFFER GETS FOR INSERT INTO LOB (fixed 9.2.0.8, 10.1.0.5 and 10.2)
3315549 - POOR PERFORMANCE WITH USING DBMS_LOB.WRITEAPPEND IN PLSQL LOOP
2602264 - DBMS_LOB.WRITEAPPEND() POOR PERFORMANCE WITH LARGE LOB AND SMALL CHUNK SIZE
2420525 - DBMS_LOB.WRITEAPPEND() PERFORMANCE ISSUES
2595083 - (description not published, although it's a base bug for some of the previous
ones)

But I'd never sat down and actually measured how much difference it can make - until today. My test appended a short string (around 35 characters) 10,000 times, using each of DBMS_LOB.append, DBMS_LOB.WriteAppend, and TEST.LOB_APPEND - a procedure that buffers the appended strings, and only flushes to the CLOB when the buffer is full:

 
    for i in 1..:loopcount loop
        dbms_lob.append(lt_doc_clob, '('||i||')'||lv_padding);
    end loop;

Here are the results from tkprof, for 10000 iterations, 30 chars of padding:

call	count	cpu	elapsed	query	current	rows
Append	1	1.57	1.96	20087	194567	1
WriteAppend	1	0.45	0.48	9999	74343	1
MyAppend	1	0.10	0.10	182	2485	1

Repeating 1,000,000 times, timing from SQL*Plus, and trying to help by wrapping the whole lot in DBMS_LOB.OPEN and DBMS_LOB.CLOSE:


dbms_lob.append without open/close: 00:03:38.07
dbms_lob.append with open/close: 00:03:40.07
Simple dbms_lob.writeappend: 00:01:12.02
Append with buffering: 00:00:10.08

We used APPEND for lazy reasons (no need to keep working out the length of the buffer, which WRITEAPPEND needs) - but the subsequent cast from VARCHAR2 to CLOB clearly contributes substantially to the problem.

In our case these LOBs are also being manipulated by many concurrent sessions on a RAC environment - so we've seen evidence of (space management) contention for the LOB segments. Reducing lio's seems to help more than a little... at least while we wait for our client to upgrade to the latest patch - which doesn't look likely any time soon!

Tuesday, April 18, 2006

"Micro-partitioning" pooled sessions in a RAC environment

One of the problems we've come across testing an OC4J based system against an Oracle 9.2.0.5 RAC database is that as a given application session borrows database sessions from the session pool, it finds itself skipping at high speed around all the nodes in the RAC. If you're not careful, it's possible to insert or update some piece of data (on one database node) and then find that it's not quite there when you take a look a millisecond later via a different node.

You can work around this by setting MAX_COMMIT_PROPAGATION_DELAY to zero. However that does impose extra load on the cluster.

The problem arises because your connect string (in the JDBC URL) specifies a RAC service, something like this:

url="jdbc:oracle:thin:@(description=(LOAD_BALANCE=on)(address=(protocol=tcp)(host=rac1)(port=1521))(address=(protocol=tcp)(host=rac2)(port=1521))(address=(protocol=tcp)(host=rac3)(port=1521))(connect_data=(service_name=RACDB)))"

This typically load-balances across all nodes in an annoyingly random and yet not entirely effective way - especially when there is a sudden increase in connections. Oracle’s default load balancing mechanism is particularly badly suited to a connection pool.

What you'd really like to do is to pin application sessions to database nodes - so that you can be (reasonably) sure that a given user, once logged on, will keep on using the same RAC node (node failure excepted, of course). Just have each app server connect to its 'own' database server.

Unfortunately, the JDBC URL configuration is clustered. Whatever you set it to has to work for all nodes. So we came up with a simple workaround:

Turn load balancing off

Add some new IP address mappings to each app server node's /etc/hosts file

use the 'virtual' hostnames in the JDBC URL

Now you have this URL:

url="jdbc:oracle:thin:@(description=(LOAD_BALANCE=off)(address=(protocol=tcp)(host=db1)(port=1521))(address=(protocol=tcp)(host=db2)(port=1521))(address=(protocol=tcp)(host=db3)(port=1521))(connect_data=(service_name=RACDB)))"

and in each /etc/hosts you have something like:


10.20.30.1      rac1.prt.stl.int      rac1 db1

10.20.30.2      rac2.prt.stl.int      rac2 db2

10.20.30.3      rac3.prt.stl.int      rac3 db3

On each app server, the mappings from racN to dbN are modified, so that each app server "prefers" its own RAC server, forming a stovepipe.
If the app server tier is on nodes app1, app2 and app3, the mapping is:

RAC node	app1 mapping	app2 mapping	app3 mapping
rac1	db1	db2	db3
rac2	db2	db3	db1
rac3	db3	db1	db2

So in normal operation, we have a stovepipe all the way down from web to app to database tiers. Ideally we would have the web load balancer intelligently cluster like users together (eg by organisation or region) to further reduce cluster overhead, but that's currently politically unacceptable. Another time, perhaps!

Cheers Nigel

Friday, April 07, 2006

SQL on Rails

My former colleague James Strachan has spotted an exciting new single tier web framework that seems absolutely perfect for us SQL hands.

I especially like the O'Reilly title Fragile Web Development with SQL on Rails. Funny, I think that's what we must have been doing all along...

Enjoy.

Sunday, April 02, 2006

How many binds does it take to save time?

Here's a tale about how something that seems intuitively correct can turn out to be 100% wrong - even though it has been stated as gospel by someone who seems like he should know.

We're building an application that - for the first year or two of operation - will include a legacy system component. Originally based on a kind of multiuser ISAM file system, the legacy data now lives in a series of over 100 Oracle schemas (one per site). The legacy app uses a legacy-to-relational bridge to access the data.

The bridge generates single table SQL to perform file-based actions like find-by-key, next record, previous record and so on. The tables are suitably indexed. However... the bridge (designed to support many different database platforms) issues SQL with no bind variables. So although there is a relatively small number of distinct access paths, each time you try one, it's got a new set of literal values in the predicates, and a hard parse is required.

The quick and easy solution is to have every bridge session start with:

ALTER SESSION SET CURSOR_SHARING='SIMILAR'
'Similar' statements share a parse tree, and the number of hard parses drops. In low volume everything works great. But as you crank up the workrate (we're using LoadRunner) the shared pool eventually gives up under the strain. Although there are relatively few 'families' of similar statements (ie statements which are exactly the same once all literals have been replaced by system generated bind variables), each single statement family may contain 100s of versions and use up to 150M of shared pool.

So the shared pool becomes fragmented and sooner or later it iss ORA-04031 time - unable to allocate memory. Even with 800M allocated to the shared pool, and 10% of it still available, the largest free chunk may be too small to be useful. We got down to a biggest chunk size of 23 bytes at one point.

A nasty side effect is that these errors result in everything slowing up. The main web application runs on clustered OC4J, using JDBC with session pooling to connect to a RAC back end. When queries start to slow down, the pool is exhausted so there's a sudden connection storm - which leads the entire system into a spiral of doom.

Now, the system architects aren't complete idiots. They had Oracle consultants on hand to remind them that bind variables are important. They even had access to the code base for the legacy bridge. So an attempt was made to add bind variable support.

It so nearly succeeded - but there was an itsy-bitsy bug in it.

The original bridge developer was asked to help solve the problem. And this is where he dug himself a big hole and jumped right in. A scientist proposes a theory, and conducts experiments that are designed to disprove it. This person simply stated what he believed:

Bind variables are used to keep certain info constant between calls. This
requires keeping and maintaining bind variable arrays. [...] To do it properly one must maintain a lot of stuff in separate arrays and relate them to the tables and types of calls being used. Rather than altering the session and letting Oracle bind them properly.

He (correctly, if sarcastically) describes what is happening:

I looked at your code briefly and your programmer brilliantly solved all the above problems. He creates a bind array, uses it in whatever the DB call is made, and then DESTROYS the array at the end of the call. It is not being used between calls AT ALL! Each DB call (in other words for every single record you read) creates, uses, then destroys the bind arrays.

And now he leaps to the wrong conclusion:

So technically you are binding, actually you have ADDED overhead and destroyed performance. [...] I am surprised you are not getting unexplained core dumps in a heavy locking situation where it accesses multiple files.

He assumed that:

using binds in the legacy bridge would be expensive

closing the OCI statement immediately after use would waste any savings

using CURSOR_SHARING=SIMILAR in Oracle would (a) work and (b) be more efficient

The team lost heart in the face of this clear statement, and the attempt to use bind variables was abandoned. Attempts were made over many months to resolve the 4031 problem by throwing memory at it, or even ignoring it.

But in fact, his advice was 100% wrong. Here's why:

The cost to the legacy bridge of setting up and using bind variables is barely detectable because:

Bind variables can easily be added into the statement as it is constructed

the memory management is simple (malloc and free)

few or no additional network roundtrips for bind data are needed - OCI piggy backs most or all of the information it needs onto the execution request

The Oracle server, on the other hand, is flat out. For it to avoid a hard parse it has to:

lexically analyse every incoming SQL to identify whether and where literals have been used

set up bind variables and extract the literals

now do a soft or hard parse using the transformed SQL

retain both forms of the SQL at least until the cursor is closed

manage all the memory structures in a pool shared by hundreds or thousands of sessions

No contest. Although a small overhead is indeed added at the bridge, a major bottleneck is cleared on the RAC server.

Now - nearly a year later - a more experienced team has had another go. Using bind variables solved the 4031 problem pretty much at the first attempt - just as any old Oracle hand would expect. And that bug? It turned out to be a simple one line coding error, which took a couple of hours code review to find and fix.

The moral of the story is: don't believe dogma; actually try and prove or disprove a hypothesis properly.

And the answer to the question in the title: if you are continually issuing 'similar' SQL statements, it is worth using bind variables even if you discard the prepared statement after a single use.

Sunday, March 26, 2006

Late Developer

I've been enjoying various Oracle and other industry blogs for some time (thank you Mark Rittman, Oracle WTF and many more), and I realise it's about time for me to contribute as well as simply lurking in the background.

While I was at SpiritSoft I started up a regular internal weekly newsletter about what was going on in the EAI industry. This spread out of the company to some of our partners and suppliers. I covered product launches, some financial news, gossip from the Java community, and any cheap scandal I could find. I managed to keep it up even after leaving SpiritSoft in 2003 - but only for a year. After two years and a hundred issues or so it petered out. It was just too much trouble trying to find enough interesting things to say each week, formatted into a dinky newsletter.

I'm hoping that more informal blogging will work better and last a bit longer. I don't have to pad it with fluff, thank the Lord.

All of your comments and feedback will be most welcome - I'm looking forward to re-establishing contact with former colleagues, and making some new friends on the way.

Cheers!