Sunday, February 22, 2009

Doubly dynamic SQL

It is great to see a new post from Oracle WTF last week, after a quiet period. Which reminded me to post this example of a dynamic search.

I won't post the whole thing, and I have disguised the column names to protect the guilty. The basic problem is that the developer didn't quite understand that if you are going to generate a dynamic query, you don't have to include all the possibilities into the final SQL.

Let's say the example is based on books published in a given year. First, to decide whether to do a LIKE or an equality, he did this:

' WHERE' ||
' (('||p_exact||' = 1 AND pdc.title = '||chr(39)||p_title||chr(39)||')' ||
' OR ('||p_exact||' = 0 AND pdc.title LIKE '||chr(39)||l_title||'%'||chr(39)||')) ' ||
' AND ('||p_year||' = 0 OR pdc.year = '||p_year||')' ||

So at runtime you get both predicates coming through. Suppose you wanted an exact search (p_exact=1), for p_title='GOLDFINGER'. We don't know the year of publication so we supply 0. The generated predicates are:

WHERE ((1 = 1 AND pdc.title = 'GOLDFINGER')
OR (1 = 0 AND pdc.title LIKE 'GOLDFINGER%'))
AND (0 = 0 or pdc.year = 0)

Wouldn't the logically equivalent:

WHERE (pdc.title = 'GOLDFINGER')

have been much easier? Add a few of these together and a nice indexed query plan soon descends into a pile of full table scans and humungous hash joins. Oh, and no use of bind variables, so in a busy OLTP application this could sabotage the SQL cache quite quickly.

My favourite part though is with the sort order. The user can choose to order by a number of different columns, either ascending or descending:

l_order := ' ORDER BY ' ||
' case '||chr(39)||p_sort||chr(39)||' when ''publisher asc'' then publisher end asc, ' ||
' case '||chr(39)||p_sort||chr(39)||' when ''publisher desc'' then publisher end desc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''book_type asc'' then book_type end asc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''book_type desc'' then book_type end desc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''title asc'' then title end asc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''title desc'' then title end desc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''year asc'' then year end asc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''year desc'' then year end desc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''book_id asc'' then book_id end asc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''book_id desc'' then book_id end desc';

Yes, you got it; given that the variable p_sort has been picked from an LOV, the whole piece of PL/SQL can be replaced by:

l_order := ' ORDER BY ' ||p_sort;

That looks better, doesn't it?

Thursday, February 05, 2009

Tibco RV in a box - would appliances help streaming SQL?

Several others have posted on the new Tibco Messaging Appliance - apparently it's Tibco Rendezvous (RV) in a box OEM'd from Solace Systems. Paul Vincent ponders at the Tibco CEP blog:
    It’s quite feasible that the same approach could be used for “basic” complex event processing operations, especially those that don’t require history (or much persistence)
Paul is certainly right that CEP and event/data stream processing engines could benefit from the appliance approach. However I think that it is precisely those applications that require history that could most benefit. A high-speed streaming engine with a co-located database (running with much of the data in cache) gives the perfect environment for answering a wider range of "right-time" BI requirements; the engine can control exactly what needs to be retained and for how long to provide the appropriate time-series data to support whatever aggregated outputs are desired. The database acts as a memory extender (allowing infrequently accessed data to be paged out), as a source of replay / recovery data, and as a history for newly defined streaming queries. The latter is most important - if you want a 1 hour moving average, most streaming queries won't give you the right answer until the end of the first hour.

Monday, February 02, 2009

Analytics as a Service

With all this talk of SQLstream's recent v2.0 launch, I was interested to read Tim Bass's CEP blog posting on Analytics-as-a-Service. He calls it A3S - and rightly avoids calling it AaaS; apart from the fact the X-as-a-Service is almost as cliched as XYZ 2.0 (and equally meaningless), careless use of that sequence of As and Ss could cause spam filters round the world to get over-excited.

If we must have the as-a-service tag, I'd like to trademark BI-as-a-service: BIAS. Apart from being a proper acronym, it also gets across that BI often gives you the answers you think you want - not necessarily the ones you really need.

Sunday, February 01, 2009

Shared Feeds

My posting rate has been quite low recently, but I have been enjoying using Google Reader to keep up with what everyone else is saying.

I've been sharing items with some friends / colleagues, but it seems harmless to open this up to the world. The topics are eclectic, but mainly around product development, event stream processing, RDBMS and data warehouse. Maybe the very occasional Dilbert or xkcd. You're all very welcome to see what I've shared - the links are now on the right hand side of this blog. And if you too are using a reader, here is a direct link to my shared items and here is the atom feed for them. If Google could just pack up my daily shares as a single blog post, wouldn't that be great.

I'm still successfully avoiding Twitter... I've no idea how I would find the time, and having seen Philip Schofield mentioned as a user (on a low rent BBC Sunday afternoon show) - and apparently now he is London's #5 tw*t(terer), so it must be hopelessly uncool anyway.

SQLstream launches v2.0 of its Event Stream Processing engine

As well as working as a freelance Oracle consultant, I have spent most of the last year working with SQLstream Inc on their Event Stream Processing engine, version 2.0 of which has now been launched.

My initial point of contact was Julian Hyde, with whom I worked in the Oracle CASE / Oracle Designer team in the early 90s. He went out to Oracle HQ and worked on bit-mapped indexes, then spent time at Broadbase before becoming best known as the founder-architect for the Mondrian OLAP server.

Event Stream Processing (ESP) is a development of what we might have called Active Database a few years ago. Rather than running queries against stored data as in an RDBMS, we register active queries which can be used to filter data arriving from any kind of stream. These active queries behave just like topic subscriptions in a message bus - except that unlike typical JMS implementations, the SQLstream query engine can do more than just filter. SQL queries against the data-in-flight can also:
  • join data between multiple streams, or between streams and tables
  • aggregate data within a stream to give rolling or periodic aggregates based on time or row counts
  • apply built-in and user-defined functions to columns within the stream's rows
  • build a network of streams and views to support complex transformation and routing requirements
Queries are defined using SQL:2003 with minimal extensions (there are some SQL examples on the web site); so developers used to working with an RDBMS will find it easy to grok the relational messaging approach. Unlike a typical message bus, most transformation and routing takes place entirely inside the SQLstream environment, rather than being delegated to external applications.

My experience working at JMS vendor SpiritSoft convinced me of the value of asynchronous message-based techniques, which in the last 10 years have spread out from high-end financial systems to ESB implementations all over the place. Since the early 80s we have seen how the RDBMS swept all other forms of structured data storage away. Now SQLstream's relational messaging approach removes the impedance mismatch between how we store data, and how we process it on the wire. In principle, this architecture can subsume both message-oriented (ESB style) and data-oriented (ETL style) integration architectures.

It should be said that "other ESP engines are available". Oracle itself has two projects on the go: its own CQL which I believe is still internal, and Oracle CEP (the rebranded BEA WebLogic Event Server - which itself is (or was) based on the open source Esper project). These two development threads will no doubt combine at some point (perhaps they already have?). IBM also has two or three independent CEP (complex event processing) projects on the go.

I think the same thing will happen to ESP / CEP as happened to ETL/EAI tools in the last ten or fifteen years. For sure, the database/application server vendors (especially Oracle and IBM) will sell plenty of this software within their respective client bases. An Oracle CEP system that was (or could be) tightly integrated with the RDBMS itself - maybe executing PL/SQL as well as Java functions - would be an easy sell. However multi-vendor sites will be interested in an agnostic / vendor-independent tool as a basis for their integration efforts. Just as Informatica has carved out a place for itself in competition with Oracle's ODI and OWB and IBM's DataStage, so SQLstream and other ESP vendors can fight for the common ground. It will be very interesting to see how it all turns out.

See the SQLstream product page for background, plus posts from Julian Hyde, CTO of SQLstream and Nicholas Goodman, Director of BI Solutions at Pentaho.

PS: here's another post from David Raab.