Efficient way to move data between Oracle and Vertica

1.6k Views Asked by At

I am working on a data ingest application. I have to run queries in oracle and then insert the result set into Vertica. However, I find it takes a lot more time to do this task. And this is what I do.

1) Execute the query in oracle and obtains the result set. Also open connection with vertica and create a prepared statement.

2) Read the oracle result sets as a batch of 30K records and execute them using the Vertica prepared statement.

3) Read the next batch of records from Oracle and repeat Step 2 until all the records read from Oracle result set.

I face two problems with this approach:

1) The process is extremely slow.

2) Some of the records are missed and they don't get passed to Vertica.

My likely solutions:

  1. Copy Oracle result sets to a file and pass them to vertica. This requires to include some file cleaning stuff in code.
  2. I can use VerticaCopyStream API to insert them to vertica tables. VerticaCopyStream API works much faster than Prepared statements. It needs to be passed an input stream to the file.
  3. If solution 1 is not better than the current solution, I can consider passing an input stream of the result set to VerticaCopyStream API.

I am supposed to do this only through Java. Can you suggest the optimized solutions for this situation?

2

There are 2 best solutions below

4
On

U can use process builder to create a process which can dump the data from oracle back to a file and load that back again to vertical using process builder So use process builder to create the command line which can dump into file and again use the same to load it back

1
On

I would go for the first option(the simple the better). The idea is to make it csv type output

        --Set the linesize large enough to accommodate the longest possible line.
    SET LINESIZE 80
    SET PAGESIZE 0
    SET FEEDBACK OFF
    SET TRIMSPOOL ON
    SET TERMOUT OFF
    SPOOL table.csv
    --setup headings
    SELECT '"col1","col2","col3","col4"' FROM dual;
    --create query to extract
    SELECT    TO_CHAR(col1) || ','
           || TO_CHAR(col2) || ','
           || TO_CHAR(col3) || ','
           || '"' || col4 || '"' 
    FROM table;
    SPOOL OFF
host 'vsql -U dbadmin -w dbadmin -c "copy table from 'data.csv' delimiter ',' enclosed by '"' direct;"

-if the process is slow, then you need to consider improving the extract query. all the formatting is not a big resource consumer.

Setup a job in your crontab or even in your Oracle scheduler to run the job. I would recommend installing oracle client on your Vertica host, or the host that is handling your data load.

Have put the copy command inside the spool script so i can make sure that the copy command will only run when the spool is done, other-wise as you said it take a long time so you don't want your spool to go out of sync with some crontab job that might do the load.

For here you can implement load monitoring using script output and validation shell script, or even some email sending with the outcome of the export and load.