I consider my self pretty good with Regular Expressions, but this one is appearing to be surprisingly tricky.
I want to trim all whitespace, except the ones between "" and [] characters.
I used this regex ("[^"]*"|\S+)\s+ but did split the [06/Jan/2021:17:50:09 +0300] part of my log into two blocks.
Here is my entire log line :
[06/Jan/2021:17:50:09 +0300] "" 10.139.3.194 407 "CONNECT clients5.google.com:443 HTTP/1.1" "" "-" "" 4245 75 "" "" "81" ""
Result I am getting based on my regex using sed command (replacing whitespace by comma):
[06/Jan/2021:17:50:09,+0300],"",10.139.3.194,407,"CONNECT clients5.google.com:443 HTTP/1.1","","-","",4245,75,"","","81",""
Finally the result that I want to have :
[06/Jan/2021:17:50:09 +0300],"",10.139.3.194,407,"CONNECT clients5.google.com:443 HTTP/1.1","","-","",4245,75,"","","81",""
Since these samples input looks like logs, so considering they will be always in same format; with this you could try following
awkcode, written and tested in shown samples in GNUawk.Explanation:
awkhere. Which hasFPAToption available in it.OFS(output field separator) as,also for all lines.awkresetting line(by resetting 1st field) to apply OFS value to it as per OP's requirement. Which will make sure that commas should come in output as per need only.Explanation of regex: