AWK with multiple delimiters

160 Views Asked by At

I'm trying to insert tshark output to MySQL database. Tshark output has a format:

VALUE1:VALUE2_1|VALUE2_2|VALUE2_3:VALUE3_1,VALUE3_2,VALUE3_3:VALUE4:VALUE5

Currently I'm trying to redirect the output through the pipe to sed and awk:

|sed -u "s/^/insert into data (item1, item2, item3, item4, item5) values('/g"|awk -F":" -v OFS=',' '{print $1,$2}'

, but only some values are necessary.

So that the resulting command must look like:

insert into data (item1, item2, item3, item4, item5) values(value1,value2_2,value3_2,value4,value5)

And some example from development environment:

Tshark permanently generates these kind of output:

1418043291.268345000:14567744356|4|8v95489464:DC0289001840,NNN100,DC206286661:03612037:ON1774B01
1418043291.201703000:14567744356|0|6b10992051:DC0289001840,NNN120,DC081276320:04891241:ON891N619 

, where

  1. 1418043291.268345000 - VALUE1
  2. 14567744356|4|8v95489464 - VALUE2
  3. DC0289001840,NNN100,DC206286661 - VALUE3
  4. 03612037 - VALUE4
  5. ON1774B01 - VALUE5

For VALUE2 and VALUE3 is interested only the middle row, i.e "4" for VALUE2 and "NN100" for VALUE3. All of the other rows from these values (VALUE2, VALUE3) are not necessary for the further calculations from out side, that's why they must be discarded and the resulting insert command must looks like:

insert into data (item1, item2, item3, item4, item5) values(1418043291.268345000,4,NNN100,03612037,ON1774B01)

Could you please advise, how to modify the awk command for inserting values:

 VALUE1:VALUE2_2:VALUE3_2:VALUE4:VALUE5 ? 
1

There are 1 best solutions below

2
On BEST ANSWER

based on your sample, explaination and with few simplification and no awk (not needed in this case and heavier for OS)

sed 's/\([^:]*\):[^|]*|\([^|]*\)|[^,]*,\([^,]*\),[^:]*:\([^:]*\):\(.*\)/insert into data (item1, item2, item3, item4, item5) values(\1,\2,\3,\4,\5)/' YourFile

(posix version so --posixon GNU sed)

the real full structure is using full value and sub group:

sed 's/\([^:]*\):\([^|]*|\([^|]*\)|^[^:]*\):\([^,]*,\([^,]*\),[^:]*\):\([^:]*\):\(.*\)/insert into data (item1, item2, item3, item4, item5) values(\1,\3,\5,\6,\7)/' YourFile

now, a awk only is also possible, sed followed by awk is to avoid (often possible) for a performance and ressource point of view