HP 3000 Manuals

Synchronization Checkpoint Records (SCR) [ ALLBASE/Replicate User's Guide ] MPE/iX 5.0 Documentation


ALLBASE/Replicate User's Guide

Synchronization Checkpoint Records (SCR) 

In order to successfully replicate the transactions from a master
DBEnvironment to a slave, the process must be synchronized.  All
appropriate records from the master must be applied to the slave, in
proper sequence, and without losing any transactions in the sequence.

The master must keep a record of the transaction identifier for the most
recently committed master transaction for each partition creating audit
log records.[REV BEG] (Every transaction that is committed against the
DBEnvironment is given a unique transaction identifier.  The
global_commit_id, log file timestamp, and the audit_name are the elements
of the transaction identifier that are essential for uniquely identifying
a transaction.)[REV END]

The slave must keep a record of the master transaction identifier for the
master transaction most recently committed (replicated) on the slave for
each partition being replicated to that slave.

ALLBASE/Replicate uses the information in these records to determine if
the master and the slave are synchronized.  This information is also used
during soft resynchronization to tell the master the transaction
identifier of the first transaction that needs to be sent to the slave.

SCR Array 

The data structure used to retain the synchronization information is
called the Synchronization Checkpoint Record Array (SCR Array).  Each
element of this array is called a Synchronization Checkpoint Record 
(SCR).

   *   On the master DBEnvironment there is one SCR for each master
       partition in the DBEnvironment being replicated to another
       DBEnvironment.  If that DBEnvironment also has some tables that
       are acting in a slave role, there will also be one additional SCR
       for each master partition that is sending transactions to the
       slave tables for replication.  In addition, there will be SCRs for
       the COMMENT and DEFAULT partitions.

   *   If any of the audit elements DEFINITION, STORAGE, AUTHORIZATION,
       or SECTION are specified in the START DBE statements for the
       master, an additional partition is created for each of those audit
       elements on the master, and each additional partition has an SCR
       associated with it.

   *   On the slave, there is one SCR for each master partition that
       sends transactions to the slave tables.  In addition, there may be
       SCRs for the COMMENT and DEFAULT partitions.

   *   If any of the audit elements DEFINITION, STORAGE, AUTHORIZATION,
       or SECTION are specified in the START DBE statements for the
       master, and if these partitions are specified when starting the
       ALLBASE/Replicate application on the slave, an additional
       partition with an associated SCR is created on the slave for each
       audit element.

You specify the maximum number of elements that will be in the SCR array
for supporting the above discussed partitions by entering a mandatory
value for the MAXPARTITIONS parameter in the START DBE NEW or START DBE
NEWLOG statements.  The number of[REV BEG] MAXPARTITIONS for each master
and slave may be different.  Allow some extra partitions for future
growth when specifying MAXPARTITIONS.[REV END]
[REV BEG]

If MAXPARTITIONS on the slave is set to a value that is greater than the
number of partitions that the DBEnvironment tracks, you can increase the
number of partitions replicated without changing the MAXPARTITIONS
parameter.  Do not raise the total number of partitions the slave tracks
above the MAXPARTITIONS value.  If the number of partitions would exceed
the value for MAXPARTITIONS, increase MAXPARTITIONS.[REV END]

Detailed Structure of SCR 

For each partition, a unique SCR in the array will contain information in
the following fields:  audit_name, partition_id, global_commit_id,
log_file_timestamp, and the local_commit_id.  See Figure 2-2 , in
which the log_file_timestamp is omitted for clarity.

   *   The audit_name is specified in the START DBE NEW or START DBE
       NEWLOG statements and uniquely identifies the DBEnvironment that
       generates the audit log record.  Each audit name must be unique
       throughout the network in which replication is taking place.

   *   The partition_id identifies the particular partition with which
       the specific SCR is associated.  The same master partition_id
       value may be used in several different DBEnvironments.
       Because the audit_name for each is unique, the combined
       audit_name/partition_id pair will always uniquely identify a
       particular SCR. This is why audit_names must be unique throughout
       a replicate network.  If not, it would not be possible to uniquely
       identify each SCR in the network, and important SCR information
       could be overwritten.

   *   The global_commit_id for the SCR of a master partition shows the
       unique identification number assigned by the master DBEnvironment
       to the most recently committed transaction on that partition.
       (Although the global_commit_id is shown in the figure as a simple,
       low valued integer for the sake of simplicity, SQLAudit shows it
       as a large, hexadecimal number.)

       On a slave, the global_commit_id in the SCR of a partition being
       replicated shows the global_commit_id assigned by the master
       DBEnvironment to the transaction most recently committed on the
       slave partition.
       [REV BEG] 

   *   The log_file_timestamp on the master reflects the time the last
       START DBE NEW or START DBE NEWLOG was done on the DBEnvironment.
       This value, along with the audit_name and the global_commit_id,
       uniquely identify a transaction.[REV END]

   *   The local_commit_id on the master always has the same value as the
       global_commit_id because the global commit and the local commit
       always have the same transaction number on the master.

       On the slave, the local_commit_id is the unique identification
       number assigned by the slave DBEnvironment as the global
       transaction commits on the slave.[REV BEG] It is used to determine
       the most recently committed transaction on the slave.[REV END] The
       local_commit_id will often be different from the global_commit_id
       on the slave because a different number of transactions
       may have executed on the slave than on the master.  The
       local_commit_id number stream advances at a different rate than
       the global_commit_id number stream.

[]
Figure 2-2. Synchronization Checkpoint Records (SCRs) How SCRs Maintain Synchronization Figure 2-2 shows the state of two SCR arrays, one for a master and one for a slave DBEnvironment, at four different time periods. The master has SCR entries for two partitions, 6 and 7, that will be replicated on the slave. The slave shows SCR entries for the two master partitions being replicated on the slave (6 and 7). The global_commit_id is used to compare the slave with the master. The local_commit_id is only for local use and is not used to compare the slave and the master. State 1 - Immediately after Hard Resynchronization. At time 1, a hard resynchronization of the slave has just taken place. At this time, neither the master nor the slave has resumed operation. The slave is even with the master. If you compare the global_commit_ids (contained in the transaction identifier) in the SCR arrays for both master and slave, you will see that the last transaction to commit against partition 6 is the master transaction 3, and the last to commit against partition 7 is master transaction 4. Hard resynchronization makes the slave a mirror image of the master for replicated partitions. Therefore, the last transaction committed against partition 6 and replicated on the slave is master transaction 3, and the last transaction committed against partition 7 to be replicated on the slave is transaction 4. State 2 - Start of Soft Resynchronization. Prior to time 2, the master DBEnvironment resumed database operations, but has not yet started replicating transactions to the slave. More[REV BEG] transactions have committed on the master. Comparing the global_commit_id fields in the SCR array for the master and slave (SCR arrays at time 2), the most recent transaction[REV END] to commit against partition 6 is master DBEnvironment's transaction 7, the most recent transaction committed against partition 7 is transaction 9 in the master DBEnvironment. At time 2, the slave DBEnvironment is brought back on line and then the slave and master ALLBASE/Replicate applications are started. The slave's first action is to ask the master to send it any transactions committed on the master that have not yet committed on the slave. To help the master determine which transactions need to be sent, the slave sends the master a copy of the slave's SCRs, showing the most recently committed transactions for the partitions being replicated. [REV BEG] By comparing the transaction identifier on the slave SCR arrays with the master SCRs, the master can determine that the last master transaction committed on the slave for partition 6 is master transaction 3, and the last master transaction committed from partition 7 is master transaction 4. Therefore, the slave is behind the master.[REV END] The master never has to keep track of what has committed on the slave. It is the slave's responsibility to tell the master the last master transaction committed in each partition every time the soft resynchronization process is started. In this scheme, if a transmitted transaction fails to successfully replicate on the slave, the slave's SCR is not updated to a new transaction number. The next time the resynchronization applications are started, the slave passes the SCR showing the last successfully committed master transaction back to the master again, and the master always knows which transaction to begin sending. The master begins sending committed transactions from partitions 6 and 7 in the proper sequential order. The slave successfully replicates them all, and as activity on the master slows down, the slave catches up with the master. For each of the master's transactions that successfully commit on the slave, the SCR for the appropriate partition is updated to reflect the master transaction identifier of the newly committed transaction. State 3 - Slave Even with Master. [REV BEG] The slave is even with the master in time 3. Compare the transaction identifiers in the SCRs from the slave with the transaction identifiers in the master SCRs. The transaction identifiers on the slave are now the same as the transaction identifiers on the master for both partitions. Thus, the slave has caught up with the master.[REV END] When the slave sends this SCR array to the master with a request for more transactions, the master sees the transaction identifiers are identical to those for the transactions last committed on the master. The master recognizes that there are no more transactions to send, and it waits until more transactions are generated against the partitions being replicated before sending any more transactions to the slave. State 4 - Slave Ahead of Master. Sometime after time 3, the master fails. A switchover takes place that enables business to continue, using what had been the slave as a new master. Now the user applications do direct updates to the new master (old slave) while the old master is being repaired. While the updates are being applied to the new master, the old master is restored to the last successfully completed transactions it recorded prior to the failure. If the roles are switched back at this point (the old master again taking over as master, and the new master going back to slave) an interesting problem arises. When the slave sends its SCR array to the master, the master notes that the slave is ahead of the master. [REV BEG] If you compare master and slave transaction identifiers at time 4, the most recently committed transaction shown in the slave's SCR array[REV END] for partition 6 is transaction 10, and the most recently committed transaction for 7 is transaction 13. In the master's SCR array, the most recently committed transaction is master transaction 7 for partition 6, and transaction 9 for partition 7. Because the slave is ahead of the master, the master recognizes there is a gap in the transaction sequence due to missing transactions on the master, and the master sends an error message to the slave. The slave stops processing. The way to avoid this problem is to resynchronize the old master from the new master before placing the old master back in service as the actual master. If the old master is not too far behind the new master, soft resynchronization can be used to bring the old master up to date. If a tremendous number of transactions have been applied to the new master while repairing the old master, hard resynchronization can be done to transfer the current image from the new master to the old master. Then the old master can be soft resynched (if necessary) to catch up on the last few new transactions, or transactions can be stopped on the new master until the old master is back on line (without losing synchronization). The roles are restored to their original configuration.


MPE/iX 5.0 Documentation