Informatica Incremental extract and load process

99 Views Asked by At

We are fetching the source data from oracle db where we have data along with create date and modify date available. For initial load we are fetching full data and loading it. For incremental load we only need to fetch the data that's inserted/updated based on create and modify dates of source.

Example :

db,table,address,create_date,update_date
sample1,a,test1,20230627,20230627
sample2,b,test2,20230627,20230627

Initial load we loaded both these rows and it loaded fine.

Incremental run Lets say first record got updated and one new row inserted into source db.

Example:

 db,table,address,create_date,update_date
 sample1,a,test1,20230627,20230628
 sample2,b,test2,20230627,20230627
 sample3,c,test3,20230628,20230628

So in this case only 2 rows need to be fetched that is first record for update and second for insert.

db,table,address,create_date,update_date
sample1,a,test1,20230627,20230628
sample3,c,test3,20230628,20230628

how do we implement it using informatica. Please note we are not using parameter files so don't have that option.

We are checking for different solutions like creating mapping variable and then use that mapping variable inside expression but as we are creating it for first time so need to know if anyone can help with some efficient solution that they used in their project that will be helpful. We need to run and fetch the data on weekly basis.

2

There are 2 best solutions below

3
On

you have two options -

  1. Use parameter files - You can use sessionwise or global param file.
    1.1. Set a param $$LAST_UPDATE_DATE in each mapping or globally. 1.2. Update the value of this before your main workflow runs using shell script.

  2. Use sysdate() variable in each mapping SQL. So, update each source wualifier query or filter using this command to fetch last 7 days data-

where update_date > sysdate()-7 

But this solution can be a problem in case of ETL failure etc.

Both options will work if you have proper upsert method implemented in mapping.

2
On

You can use initialization session that will first get the max date from your target and use it to set a variable. Next, you should pass the variable value to second session that will perform the actual load fetching the data from the source having date later (or '>=') then the date obtained from your target.