IICS Extracting JSON Data from REST API and then Loading to Relational Database/Datawarehouse (Snowflake)

800 Views Asked by At

I am new to informatica cloud. I am learning to integrate data from REST API right now. my purpose is to build an ETL to extract data from REST API and then directly Load data into snowflake.

I am using IICS data integration to do this. However, I found that, if I extract the data from API and load it to a csv file in my local computer using flat file connection first, and then I extract the data from this csv and load to snowflake, there will be no errors and I can see the table in snowflake is filled up. But, if I extract the data from API and directly load it to snowflake, there will be nothing in snowflake after running the mapping and there will be error in the log file stating:

READER_1_1_1> CCI_1000 [2023-04-24 10:53:52.354] [ERROR] Failed to establish a connection because of the following error: javax.net.ssl.SSLException: Connection reset

READER_1_1_1> GENERIC_READ_40 [2023-04-24 10:53:52.355] [ERROR] Fault occurred while reading data from external system: [<INFA:Lytx_Get_Event_Triggers_Swagger_FAULT xmlns:INFA="http://xml.schemas/infa/procedure/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><Lytx_Get_Event_Triggers_Swagger><RequestXML>{ "x-apikey" : "XXXXXXXXX" }</RequestXML><ErrorCode>608</ErrorCode><ErrorMessage>Failed to establish a connection because of the following error: javax.net.ssl.SSLException: Connection reset</ErrorMessage></Lytx_Get_Event_Triggers_Swagger></INFA:Lytx_Get_Event_Triggers_Swagger_FAULT>].

Then nothing will be loaded from the source

SESSION LOAD SUMMARY

DIRECTOR> TM_6252 [2023-04-24 10:53:53.473] Source Load Summary. DIRECTOR> CMN_1740 [2023-04-24 10:53:53.473] Table: [Source_API] (Instance Name: [Source_API] Instance UI Name: [Source_API]) Output Rows [0], Affected Rows [0], Applied Rows [0], Rejected Rows [0] DIRECTOR> TM_6253 [2023-04-24 10:53:53.473] Target Load Summary. DIRECTOR> CMN_1740 [2023-04-24 10:53:53.473] Table: [TEST_EVENT_TRIGGER_STAGING] (Instance Name: [TEST_EVENT_TRIGGER_STAGING] Instance UI Name: [Target_Snowflake]) Output Rows [0], Affected Rows [0], Applied Rows [0], Rejected Rows [0] DIRECTOR> TM_6023 [2023-04-24 10:53:53.473]

I guess what it is saying is that when the source transformation tries to read data from REST API, the API server connection reset/shutdown? But why this did not happen when I just extract the data from API and load to local csv file? Is it due to an API server issue, or is it because of me not loading it to csv but directly to snowflake?

Later on I also tried to add a hierarchical parser in my mapping to parse the JSON data from the source, hoping this will solve the issue, but I still got this connection reset error again.

hierarchical parser details

hierarchical parser details

hierarchical parser details - why the input contains 4 fields, and output only 1 field, I cannot match them

hierarchical parser detailsenter image description here

the entire mapping

Is it that I did wrong in anywhere of the hierarchical parser setup? Or are there other ways I should use to extract data from REST API and load to database instead of using hierarchical parser?

I want to learn how to directly load the data into snowflake because if everytime I need to first save the output to a local csv, then later on I still need to delete the csv files, and there will be manual work.

Thank you for your help!

1

There are 1 best solutions below

0
On

Briefly speaking this is happening due to the way the Snowflake connector works. I'd rather recommend extracting data from REST API to flatfile (idealy on S3 bucket), and loading the file into Snowflake with e.g. Snowflake Task. Or by using IICS Mass Ingestion.