I am at a loss of what to do as I am able to fully read in a column with a Flat File Source, but if certain things are avaiable, I need to split those into a separate column.
Example:
line 1) 2013-08-23 14:03:17 ipaddress:port @client POST /api func_0() result(0)
line 2) 2013-08-23 14:03:18 ipaddress:port @client POST /api/logout LOGOUT
(lm=local,haspid=randomnumbers,feat=0,sess=0000007E,duration=8400) result(0)
line 3) 2013-08-23 14:03:18 ipaddress:port @client POST /api/logout LOGOUT
(lm=local,haspid=randomnumbers,feat=1,sess=0000007D,duration=8408) result(0)
(wrapped for legibility: each of these three is really one long line)
I need to do some sort of derived column to split out the contents of sess= and duration= into their own separate columns, but as you can see, sometimes the function column is different and contains an empty func_0, or some other type of function, so it couldn't be done by delimiting it in the Flat file source read-in.
Any ideas?
I'd use a script transform here. You could use a regEx or simple string.contains to see if you have data in the /api column.
Then use split() or regEx to pull the data out of sess= and duration= & write it into new columns.
You could create the new columns in the script transform but personally I like to use a derived column transform BEFORE the script task to create the new columns - e.g. sess & duration.
You then have the columns created for all rows & just need to use the script task to add data to the sess & duration columns if data exists in the /api column.
I hope this makes sense!