Monday, February 26, 2007

Unique IDs for multi-master replication - Sequence or SYS_GUID?

Oracle-L is proving to be a good source of inspiration at the moment. Oracle ACE Syed Jaffar Hussain asked the question Is it possible to share a database sequence between multiple databases?

Using remote sequences
A couple of the replies took the question a bit too literally, and said yes, you can define a sequence on one database, and use it from another:

Select SequenceOwner.RemoteSeqName.NextVal@DBLinkName From dual


This works - and it can be made 'location transparent' by creating a local synonym for the remote sequence. But it is asymmetric (one database has to own the sequence) and it introduces a point of failure. If database 1 owns the sequence, database 2 can only insert rows if database 1 is actually available. If you could guarantee availability, why would you bother with replication?

Using a sequence local to each database
Several replies (including mine) suggested using carefully defined sequences which will deliver discrete values on two or more master databases. There are two basic patterns:

Partitioned: Suggested by several posters, the number space is divided up in blocks:

on db 1: create sequence myseq start with 1000000 max 1999999
on db 2: create sequence myseq start with 2000000 max 2999999


A variant on this is to use positive numbers for database 1, and negative numbers for database 2.

Interleaved: The more popular option is to use the old trick of assigning odd numbers to database 1, and even numbers to database 2:

on db 1: create sequence myseq start with 1 increment by 2
on db 2: create sequence myseq start with 2 increment by 2

This mechanism is much easier to manage (essentially, it doesn't need any further management). It is also easy to extend to 3, 4, 27 or 127 masters - just set the "start with" to the database number, and "increment by" to the maximum anticipated number of databases required.

A third option was also proposed by Mark D Powell: use SYS_GUID(). That has some disadvantages:
  • SYS_GUID() is bigger (16 bytes on 9iR2) than a NUMBER (eg a 13 digit integer needs around 8 bytes). Obviously the extra space follows through to indexes, foreign keys etc.

  • Another is simply that it is a RAW, which has some possibly undesirable implications; for example several common tools (including SQL*Developer 1.0) can't directly display RAW values; you have to explicitly select RAWTOHEX(id_column).



On the other hand, SYS_GUID can be defined as the default value for a column - unlike NEXTVAL which is normally set in a trigger or directly as part of an insert into/select from statement. Worse, it is very common to see a separate SELECT seq.NEXTVAL FROM DUAL for every ID generated. I've never investigated the relative performance of SYS_GUID() against getting a sequence number - anyone else like to share that? That may well be the most important consideration of all.

So I think I'll be coming back to this subject in future.

Update - later the same day:

I've quickly timed a million iterations of select sys_guid() from dual, and a million iterations of select sequence.nextval from dual (Oracle XE 10.2.0.1, HP dv1665 Centrino Duo running Windows XP)

select sys_guid() into variable from dual: 89 seconds
select seq.nextval into variable from dual: 40 seconds

For good measure, I've added a variant - one million calls of sys_guid() without a select:

variable := sys_guid() : 95 seconds

One of the potential advantages of sys_guid() I had anticipated is that it could be called directly from PL/SQL - but it looks like the implementation is less efficient than it might be; tracing shows that the PL/SQL function recursively selects SYS_GUID() from dual.

So I think I'll stick to traditional and more convenient sequence.nextval for now.

Chin chin!

4 comments:

The Human Fly said...

Hi,

In our case SYS_GUID is not acceptable as wee only need number not the type of value generated by the SYS_GUID function.
This function would useful when going for replication where the column doesn't have primary or unique constraints and rather than assuming, simply SYS_GUID can be used.

Jaffar

Nigel said...

Syed

I understand you have an ID column already defined as a NUMBER, and it would be a big pain to change everything late in an application development from NUMBER to RAW(16). But in a new application, given that a sequence generated number is (usually) completely hidden from the end user, is there any reason to prefer a NUMBER to a RAW - other than those I gave above?

Of course, if the number was visible (eg used as a generated account number) that would be a reason to retain NUMBER - it's much easier to remember than 32 hex digits!

Regards Nigel

BradW said...

I am looking into an alternative using Java. I will see what I come up with. In Java, I generate a million GUIDs in under 200ms. I am just waiting for the DBAs to load java for me... :) I will then be also adding encoding / compression so I can store the value in a RAW(11) instead of RAW(16). I will post some metrics for you.

Unknown said...

Hi,

We have an database with Sequence Number for PKs. The application should work in both connected and disconnected mode. So we are planning to install local database instance in the clinet machine. Also we are planning to add MVs to replicated data which are to be created in the server alone. Now we need to synchronize the data in some table which are allowed to manipulated in disconencted mode. My suggestion is to have a different range of sequecnce numbers server(1-2 million in client and 3-10 million in server db). A manula sync alogorithm can be written to syn the local data to the server (use server seq while uploading the data) and remove the local data once the sync is over. But my project managet try the GUID insted of Sequnce number. Do you think which is better for my situation. Please note that the data which is allowed to created in the local can be added by 2000 users(1/4th of them may be online at a time).