How can I compare files in two object store buckets using an Oracle Autonomous Database?

504 Views Asked by At

I have two buckets in different regions in OCI object store. I want to compare files in the two buckets using Oracle Autonomous Database, to detect missing files and copy them over and synchronize the two buckets. I use the two buckets in different regions as replicating some configuration metadata between the services running in the two regions.

Eg:

  • Bucket1 = 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/mynamespace/b/mybucket/folder1/'
  • Bucket2 = 'https://objectstorage.us-phoenix-1.oraclecloud.com/n/mynamespace/b/mybucket/folder1/'
1

There are 1 best solutions below

0
On BEST ANSWER

Oracle Autonomous Database provides the DBMS_CLOUD package for accessing object store files.

DBMS_CLOUD package has a function LIST_OBJECTS, which provides a list of files in an object store location (bucket or subfolder). This function provides the object name , size in bytes and checksum, which can be used to compare files in the 2 buckets with the help of Oracle SQL FULL OUTER JOIN and WITH Clause.

SQL Script:

define bucket1_uri='https://objectstorage.us-phoenix-1.oraclecloud.com/n/mynamespace/b/mybucket/o/folder1/'
define bucket2_uri='https://objectstorage.us-ashburn-1.oraclecloud.com/n/mynamespace/b/mybucket/o/folder2/'
define credname1='OCI$RESOURCE_PRINCIPAL'
define credname2='OCI$RESOURCE_PRINCIPAL'

set linesize 120
set pages 1000
col object_name_A format a30
col object_name_B format a30
col object_checksum_A format a10
col object_checksum_B format a10


-- Use SQL FULL OUTER JOIN and compare the checksum exclude matching rows
WITH bucket1 AS 
   (SELECT object_name, bytes, checksum FROM DBMS_CLOUD.LIST_OBJECTS('&credname1', '&bucket1_uri')
    ORDER BY object_name, bytes, checksum),
     bucket2 AS
   (SELECT object_name, bytes, checksum FROM DBMS_CLOUD.LIST_OBJECTS('&credname2', '&bucket2_uri')
    ORDER BY object_name, bytes, checksum)
SELECT rownum id, diff.* FROM
(SELECT a.object_name object_name_A, a.bytes object_size_A, a.checksum object_checksum_A,
        b.object_name object_name_B, b.bytes object_size_B, b.checksum object_checksum_B
  FROM bucket1 a FULL OUTER JOIN bucket2 b
    ON a.object_name = b.object_name) diff
WHERE diff.object_name_A IS NULL OR
      diff.object_name_B IS NULL OR
      (diff.object_name_A = diff.object_name_B AND
       diff.object_checksum_A != diff.object_checksum_B)
/

Explanation:

In the above script:

  1. The assumption is to provide 2 Object store URLs and a credential name. One can use Resource principal for OCI Object store in Autonomous Database to avoid specifying passwords.
  2. Using Oracle SQL WITH Clause, there are 2 inline views to provide the object_name, bytes and checksum columns from DBMS_CLOUD.LIST_OBJECTS in the two buckets.
  3. The script uses file checksum to compare files in the two buckets, which is provided by cloud object store providers as a hash of the file content.
  4. Using SQL Full Outer Join, we can get objects existing in bucket1 and not in bucket2, or vice-versa, as well as objects that have a same name in both buckets but different file checksum.
  5. Finally, the SQL above does not show objects that have a same name and same size in both buckets.

Use Cases:

The DBMS_CLOUD package works for multiple object stores such as Oracle OCI, AWS S3, Azure BLOB, Google Cloud, Wasabi, so the above script can be used to compare buckets in multiple scenarios such as -

  • Compare buckets in same region in OCI
  • Compare buckets in different regions in OCI
  • Compare buckets in different accounts in OCI object store
  • Compare buckets in different cloud providers - OCI bucket vs S3 bucket

References: