I'm taking a denormalized Access DB and moving it to a normalized table in SQL based on EF 4.
Let's say I have the following in the Access DB
-------------------
| NAME | FRUIT |
-------------------
| John | Apple |
| Jane | Apple |
| Mark | Orange |
| Nancy | Orange |
| Jim | Apple |
-------------------
I'm creating the following two databases in SQL
tblNames:
-------------------------------
| ID | Name | FruitID |
-------------------------------
| 1 | John | 1 |
| 2 | Jane | 1 |
| 3 | Mark | 2 |
| 4 | Nancy | 2 |
| 5 | Jim | 1 |
-------------------------------
tblFruits:
--------------------
| ID | Fruit |
--------------------
| 1 | Apple |
| 2 | Orange |
--------------------
In my dataflow, I can pull the names using Derived Column (the table is actually larger and I'm doing some data type transformations). However, how do I take the output from the Derived Column and for each row match it to the proper ID of the fruits table producing the correct output that needs to be inserted into the tblNames database?
Assuming that tblFruits has already been populated prior to the dataflow to populate tblNames, you'll want to add a Lookup Transformation to the sequence so that you can retrieve the correct tblFruits.ID for a given Fruit.
You don't specify which version of SSIS you are using but something to be mindful of it it expects to always find a match in the target table. In 2005, it was painful as not finding a match was the same error that the table doesn't exist or you lack permissions. 2008+ allows you to identify the rows that didn't find a match in the target table.
The most important thing to get in your mind with using the lookup is that you should only pull back the columns you need. The table selector is handy but don't be lazy and click it. It will consume more resources than you need and all you'll know is "SSIS is freaking slow and a piece of junk."
Other helpful and unasked for advice is to not do too many operations in a derived column. Instead, use multiple derived columns that have a tight focus. Sounds counterintuitive but the SSIS engine can parallelize operations better that way.
I'm not sure how the EF tag applies but perhaps I'm missing some nuance of the problem.