Monday, February 26, 2007

Unique IDs for multi-master replication - Sequence or SYS_GUID?

Oracle-L is proving to be a good source of inspiration at the moment. Oracle ACE Syed Jaffar Hussain asked the question Is it possible to share a database sequence between multiple databases?

Using remote sequences
A couple of the replies took the question a bit too literally, and said yes, you can define a sequence on one database, and use it from another:

Select SequenceOwner.RemoteSeqName.NextVal@DBLinkName From dual


This works - and it can be made 'location transparent' by creating a local synonym for the remote sequence. But it is asymmetric (one database has to own the sequence) and it introduces a point of failure. If database 1 owns the sequence, database 2 can only insert rows if database 1 is actually available. If you could guarantee availability, why would you bother with replication?

Using a sequence local to each database
Several replies (including mine) suggested using carefully defined sequences which will deliver discrete values on two or more master databases. There are two basic patterns:

Partitioned: Suggested by several posters, the number space is divided up in blocks:

on db 1: create sequence myseq start with 1000000 max 1999999
on db 2: create sequence myseq start with 2000000 max 2999999


A variant on this is to use positive numbers for database 1, and negative numbers for database 2.

Interleaved: The more popular option is to use the old trick of assigning odd numbers to database 1, and even numbers to database 2:

on db 1: create sequence myseq start with 1 increment by 2
on db 2: create sequence myseq start with 2 increment by 2

This mechanism is much easier to manage (essentially, it doesn't need any further management). It is also easy to extend to 3, 4, 27 or 127 masters - just set the "start with" to the database number, and "increment by" to the maximum anticipated number of databases required.

A third option was also proposed by Mark D Powell: use SYS_GUID(). That has some disadvantages:
  • SYS_GUID() is bigger (16 bytes on 9iR2) than a NUMBER (eg a 13 digit integer needs around 8 bytes). Obviously the extra space follows through to indexes, foreign keys etc.

  • Another is simply that it is a RAW, which has some possibly undesirable implications; for example several common tools (including SQL*Developer 1.0) can't directly display RAW values; you have to explicitly select RAWTOHEX(id_column).



On the other hand, SYS_GUID can be defined as the default value for a column - unlike NEXTVAL which is normally set in a trigger or directly as part of an insert into/select from statement. Worse, it is very common to see a separate SELECT seq.NEXTVAL FROM DUAL for every ID generated. I've never investigated the relative performance of SYS_GUID() against getting a sequence number - anyone else like to share that? That may well be the most important consideration of all.

So I think I'll be coming back to this subject in future.

Update - later the same day:

I've quickly timed a million iterations of select sys_guid() from dual, and a million iterations of select sequence.nextval from dual (Oracle XE 10.2.0.1, HP dv1665 Centrino Duo running Windows XP)

select sys_guid() into variable from dual: 89 seconds
select seq.nextval into variable from dual: 40 seconds

For good measure, I've added a variant - one million calls of sys_guid() without a select:

variable := sys_guid() : 95 seconds

One of the potential advantages of sys_guid() I had anticipated is that it could be called directly from PL/SQL - but it looks like the implementation is less efficient than it might be; tracing shows that the PL/SQL function recursively selects SYS_GUID() from dual.

So I think I'll stick to traditional and more convenient sequence.nextval for now.

Chin chin!

3 comments:

The Human Fly said...

Hi,

In our case SYS_GUID is not acceptable as wee only need number not the type of value generated by the SYS_GUID function.
This function would useful when going for replication where the column doesn't have primary or unique constraints and rather than assuming, simply SYS_GUID can be used.

Jaffar

Nigel said...

Syed

I understand you have an ID column already defined as a NUMBER, and it would be a big pain to change everything late in an application development from NUMBER to RAW(16). But in a new application, given that a sequence generated number is (usually) completely hidden from the end user, is there any reason to prefer a NUMBER to a RAW - other than those I gave above?

Of course, if the number was visible (eg used as a generated account number) that would be a reason to retain NUMBER - it's much easier to remember than 32 hex digits!

Regards Nigel

BradW said...

I am looking into an alternative using Java. I will see what I come up with. In Java, I generate a million GUIDs in under 200ms. I am just waiting for the DBAs to load java for me... :) I will then be also adding encoding / compression so I can store the value in a RAW(11) instead of RAW(16). I will post some metrics for you.