Monday, July 06, 2009

ETL patent case: Constellar and DataMirror let off off the hook; DataStage still in dock

Once again Vincent McBurney delivers a fantastic summary of the latest state of the Juxtacomm ETL patent case: SQL Server, DB2 and DataStage will fight out Data Integration Patent Infringement.

I'm most interested from the Constellar point of view - I first came across Constellar (then Information Junction) as a product on sale in late 1993 / early 1994 (before joining the company from Oracle in 1995), so it always seemed clear to me that it would qualify as prior art to Juxtacomm's 1998 patent. Oddly, it seems that the parties to the trial have agreed that Constellar Hub (and DataMirror Transformation Server) can be dropped from consideration; they won't be subject to damages - but equally they won't be considered as prior art. I don't understand that, but I guess the IBM lawyers must know what they are doing.

So, the case rumbles on, serving (if nothing else) to show how broken the US software patent system is.

Wednesday, July 01, 2009

SQLstream delivers instant data stream analysis of Mozilla 3.5 downloads

Here are a couple of posts that describe the download monitor/dashboard which is giving up-to-the-second statistics for downloads by country of the latest Mozilla release 3.5 (just about to top 5.5 million downloads since yesterday's launch). The dashboard has been put together with the help of my friends at SQLstream. Just don't try looking at this with Internet Explorer, as it doesn't support HTML5.

Julian Hyde on Open Source OLAP. And stuff.: SQLstream powers ...
By Julian Hyde
SQLstream gathers data from Mozilla's download centers around the world, assigns each record a latitude and longitude, and summarizes the information in a continuously executing SQL query. Data is read with sub-second latencies, ...
Julian Hyde on Open Source OLAP.... - http://julianhyde.blogspot.com/
SQLstream the Sequel - RealTime Intelligence for Mozilla BI in Action
From ebizQ Presents BI in Action Virtual Conference ...

Monday, June 15, 2009

Be Alert!

Here's a tale of woe from an organisation I know - anonymised to protect the guilty.

A couple of weeks after a major hardware and operating system upgrade, there was a major foul-up during a weekend batch process. What went wrong? What got missed in the (quite extensive) testing?

The symptom was that batch jobs run under concurrent manager were running late. Very late. In fact, they hadn't run. The external scheduling software had attempted to launch them, but failed. Worse than that, there had been no alerting over the weekend. Operators should have been notified of the failure of critical processes by an Enterprise Management (EM) tool.

Cut to the explanation:

As part of the O/S upgrade, user accounts on the server are now set to be locked out if three or more failed attempts to login are made. Someone in operations-land triggered a lockout on a unix account used to run the concurrent manager. And he didn't report it to anyone to reset it. So that explained the concurrent manager failures.

The EM software that should have woken up the operators also failed. Surprise, surprise: it was using the same (locked-out) unix account.

And finally, the alerting rules recognised all kinds of warnings and errors, but noone had considered the possibility that the EM system itself would fail.

Well, it's only a business system; though a couple of C-level execs got their end of month reports a couple of days late, and there were plenty of red faces, nobody actually died...

Just keep an eye out for those nasty corner cases!

Sunday, June 07, 2009

Oracle Exadata posts #1 TCP-H result

Grag Rahn's Structured Data blog provides the data that Kevin Closson had to remove from his own blog. From an HP/Oracle point of view, a very good performance, reducing cost/QphH by a factor of 4.

However, it is interesting to see that the HP/Oracle solution is still more than 4 times the cost/QphH of the #2 placed Exasol solution (running on Fujitsu Primergy, and reported a year ago) - while the absolute performance improvement is relatively slight (1.16M queries/hr against 1.02M).

Tuesday, April 21, 2009

Flutter is the new Twitter

Worth a look, for those (like me) who find 140 characters too much to hande. Hat-tip to the BCS Oddit blog
.

Sunday, February 22, 2009

Doubly dynamic SQL

It is great to see a new post from Oracle WTF last week, after a quiet period. Which reminded me to post this example of a dynamic search.

I won't post the whole thing, and I have disguised the column names to protect the guilty. The basic problem is that the developer didn't quite understand that if you are going to generate a dynamic query, you don't have to include all the possibilities into the final SQL.

Let's say the example is based on books published in a given year. First, to decide whether to do a LIKE or an equality, he did this:

' WHERE' ||
' (('||p_exact||' = 1 AND pdc.title = '||chr(39)||p_title||chr(39)||')' ||
' OR ('||p_exact||' = 0 AND pdc.title LIKE '||chr(39)||l_title||'%'||chr(39)||')) ' ||
' AND ('||p_year||' = 0 OR pdc.year = '||p_year||')' ||

So at runtime you get both predicates coming through. Suppose you wanted an exact search (p_exact=1), for p_title='GOLDFINGER'. We don't know the year of publication so we supply 0. The generated predicates are:

WHERE ((1 = 1 AND pdc.title = 'GOLDFINGER')
OR (1 = 0 AND pdc.title LIKE 'GOLDFINGER%'))
AND (0 = 0 or pdc.year = 0)

Wouldn't the logically equivalent:

WHERE (pdc.title = 'GOLDFINGER')

have been much easier? Add a few of these together and a nice indexed query plan soon descends into a pile of full table scans and humungous hash joins. Oh, and no use of bind variables, so in a busy OLTP application this could sabotage the SQL cache quite quickly.

My favourite part though is with the sort order. The user can choose to order by a number of different columns, either ascending or descending:

l_order := ' ORDER BY ' ||
' case '||chr(39)||p_sort||chr(39)||' when ''publisher asc'' then publisher end asc, ' ||
' case '||chr(39)||p_sort||chr(39)||' when ''publisher desc'' then publisher end desc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''book_type asc'' then book_type end asc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''book_type desc'' then book_type end desc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''title asc'' then title end asc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''title desc'' then title end desc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''year asc'' then year end asc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''year desc'' then year end desc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''book_id asc'' then book_id end asc,' ||
' case '||chr(39)||p_sort||chr(39)||' when ''book_id desc'' then book_id end desc';

Yes, you got it; given that the variable p_sort has been picked from an LOV, the whole piece of PL/SQL can be replaced by:

l_order := ' ORDER BY ' ||p_sort;

That looks better, doesn't it?

Thursday, February 05, 2009

Tibco RV in a box - would appliances help streaming SQL?

Several others have posted on the new Tibco Messaging Appliance - apparently it's Tibco Rendezvous (RV) in a box OEM'd from Solace Systems. Paul Vincent ponders at the Tibco CEP blog:
    It’s quite feasible that the same approach could be used for “basic” complex event processing operations, especially those that don’t require history (or much persistence)
Paul is certainly right that CEP and event/data stream processing engines could benefit from the appliance approach. However I think that it is precisely those applications that require history that could most benefit. A high-speed streaming engine with a co-located database (running with much of the data in cache) gives the perfect environment for answering a wider range of "right-time" BI requirements; the engine can control exactly what needs to be retained and for how long to provide the appropriate time-series data to support whatever aggregated outputs are desired. The database acts as a memory extender (allowing infrequently accessed data to be paged out), as a source of replay / recovery data, and as a history for newly defined streaming queries. The latter is most important - if you want a 1 hour moving average, most streaming queries won't give you the right answer until the end of the first hour.