Clearcase: How to repair a replica which have oplogs corrupted

49 Views Asked by At

I have a replica called KENIA_ICWP_ITEC which, when I try to apply a sync package from the replica INDRA_ICWP_ITEC, a core dump is generated:

# multitool syncreplica -import sync_INDRA_ICWP_ITEC_2023-06-02T15.00.04+02.00_28369
Segmentation fault (core dumped)

These the actual oplogs for the replica:

# multitool lsepoch replica:KENIA_ICWP_ITEC@/itec/icwp
For VOB replica "/itec/icwp":
Oplog IDs for row "KENIA_ICWP_ITEC" (@ icwpkenia):
 oid:e0f2c369.06e211e6.86f4.98:4b:e1:0c:9c:e6=1795234     (ASTURIAS_ICWP_ITEC)
 oid:9ed5bd47.4d8911e5.8730.98:4b:e1:0c:9c:e6=142972      (DFS_ICWP_ITEC)
 oid:c8f29621.e8a611dd.8b1e.00:1c:c4:93:d2:28=59140877    (INDRA_ICWP_ITEC)
 oid:87730776.630911e7.9303.98:4b:e1:0c:9c:e6=0           (ITECPROD_ITEC_ICWP)
 oid:908bc8fe.f21311e6.920c.98:4b:e1:0c:9c:e6=0           (ITOOLS4_ICWP)
 oid:d35003bc.1cc211e1.9311.98:4b:e1:0c:9c:e6=970073      (JLG_ICWP_ITEC)
 oid:0daafabd.dc9211e5.9fe7.98:4b:e1:0c:9c:e6=14669852    (KENIA_ICWP_ITEC)
 oid:16fba05b.802511e2.8978.98:4b:e1:0c:9c:e6=212378      (OBSOLETE_ASTURIAS_ICWP_ITEC.deleted)
 oid:c9d7f0f2.fc6611e3.8d73.98:4b:e1:0c:9c:e6=0           (UK_ICWP_ITEC)

I have detected that there are some corrupted oplog entries in the database at the replica.

# multitool dumpoplog -long -name -invob /itec/icwp -vreplica INDRA_ICWP_ITEC 136082394

 

136082394:

op= uncheckout

replica_oid= c8f29621.e8a611dd.8b1e.00:1c:c4:93:d2:28 (INDRA_ICWP_ITEC)

oplog_id= 59140863

op_time= 2023-06-02T12:07:53Z create_time= 2023-06-02T12:53:52Z

data size= 20 data= 0x8d509c0

------------

ckout_ver_oid= 1a1a16fc.013f11ee.92ea.98:4b:e1:0c:9c:e6

              (*object not found*)

 

# multitool dumpoplog -long -name -invob /itec/icwp -vreplica INDRA_ICWP_ITEC 136082395

multitool: Error: Operation "xdr_vob_oplog_data_t (decode)" failed: error detected by ClearCase subsystem.

 

# multitool dumpoplog -long -name -invob /itec/icwp -vreplica INDRA_ICWP_ITEC 136082396

Segmentation fault (core dumped)

 

# multitool dumpoplog -long -name -invob /itec/icwp -vreplica INDRA_ICWP_ITEC 136082397

 

136082397:

op= uncheckout

replica_oid= c8f29621.e8a611dd.8b1e.00:1c:c4:93:d2:28 (INDRA_ICWP_ITEC)

oplog_id= 59140866

op_time= 2023-06-02T12:07:55Z create_time= 2023-06-02T12:53:54Z

data size= 20 data= 0x820c9c0

------------

ckout_ver_oid= 1b338011.013f11ee.92f9.98:4b:e1:0c:9c:e6

              (*object not found*)

It seems that only the remote oplogs are corrupted, because in the master replica INDRA_ICWP_ITEC, the oplogs are showed correctly.

Is there any way to repair the replica without to be replaced from another healthy replica?

Any help appreciated, Emilio.

1

There are 1 best solutions below

0
VonC On

Check first the technote "Operation "xdr_vob_oplog_data_t (decode)" failed"

This error indicates one or more corrupted oplog entries in the database at the replica running the syncreplica command.

Diagnosing The Problem

If the oplog is not listed in the error, dumpoplog can be used to attempt to narrow down the problem oplogs.

Determine when the last packet was successfully sent from the problem replica to the remote replica associated with the synchronization that is currently failing:

multitool lshistory replica:<remote replica>
M:\myview\myvob>cleartool lshistory -minor replica:rep-3
04-Nov.13:21   user export sync from replica "rep-1" to replica "rep-3"
  "Exported synchronization information for replica "rep-3".
  Row at export was: rep-4=0 rep-3=0 rep-2=0"

Use the date from the output with dumpoplog.

multitool dumpoplog -long -name  -since 04-Nov.13:21

If the oplog entry order is known, it is possible to start with the entry order value listed in the error minus a few numbers:

   oplog entry with order:2064989
   multitool dumpoplog -long -name -from 2064900

Next run dumpoplog in reverse to find the failing oplog in the opposite direction. It will most likely be a different oplog than the one found in the previous step:

multitool dumpoplog -long -name -reverse

Both of these commands will most likely fail with a similar error to the syncreplica at or near the oplog producing the problem.

That being said, any resolution should be done in coordination with IBM support.
Scrubbing the VOB (using the command /usr/atria/etc/vob_scrubber VOB_PATH) might help, again, to be done under support supervision. (plus, the process date from 2008, and, depending on your current CC version, might not be relevant today)