I have data in csv file. Here is the example
Sr# Inv_ID IdType IdValue MIC
1 F000000EU6 CUSIP 29088TEST null
2 F000000EU6 ISIN US290886TEST null
3 F000000EU6 SEDOL 23TEST1 XNAS
4 F000000EU6 TICKER ETEST XNAS
5 F000000EU6 SEDOL 23TEST2 XNGS
6 F000000EU6 TICKER ETEST XNGS
Investment Vehicle ID + MIC is a unique Key. After processing it should give me below result
Sr# SEDOL MIC ISIN CUSIP TICKER Inv_ID
2 23TEST1 XNAS US290886TEST 29088TEST ETEST F000000EU6
3 23TEST2 XNGS US290886TEST 29088TEST ETEST F000000EU6
Since Investment Vehicle ID + MIC is unique combination it should generate two rows mentioned above. However Sr#1 and #2 same Investment Vehicle ID but MIC is null it should not generate separate row but add ISIN and CUSIP to the existing records
I was thinking of reading file twice #1 for IDTypes = SEDOL and TICKER and skip ISINs and CUSIPs. Generate records with key Investment Vehicle ID + MIC So in above case two records will be generated with two different keys --> F000000EU6XNAS AND F000000EU6XNGS Next read the file with ISIN and CUSIP Update the existing records which with keys start with F000000EU6 with ISIN and CUSIPs values No Match then create separate records with ISIN and CUSIP I am facing lot of performance issues.
You can definitely do it in one run. Your question is not very clear or complete. I had to cut some corners for the following code to work:
csvmodule, but this could be made more efficientSr#column in the output, so I just left it outAnyway, I hope the following is helpful:
The clue is that on scanning the file, you either store data for an investment at a specific (
investments) or more generic (inv_info) level and just before writing collect all relevant data for a line.(I only just now realise that I was assuming that 'inv' stands for 'investment', but probably 'inventory' makes more sense? :-D)