Extended regex "." seems not to be matching everything

86 Views Asked by At

I have a file containing this header FIELD1 FIELD2 : 0x30070040 and a lot of junk characters (half the file's size). To get rid of all of them I execute these commands:

dos2unix -q -n file
sed -i $'s/[^[:print:]\t]//g' file #Removing every non-printable character (yes, dos2unix was not enough)

But then I end up having a file containing this odd header. If I copy and paste it from shell it looks like this:

PFcount_01032019.txt0000777017777601777760116201541013436157760015052 0ustar  nfsnobodynfsnobody▒▒FIELD1   FIELD2 : 0x30070040

If I copy and paste from a text editor like VIM it looks like this:

PFcount_01032019.txt0000777017777601777760116201541013436157760015052 0ustar  nfsnobodynfsnobodyÿþFIELD1   FIELD2 : 0x30070040

Note the two special characters just before FIELD1.

Now I would like to end up with an header like this:

FIELD1   FIELD2

It is important to keep everything that is between FIELD1 and FIELD2 too because that is the fields separator of the file. I thought about using this:

sed -i -r '1 s/.+(FIELD1.+) : 0x.+/\1/g' file

But apparently .+FIELD1 does not match with PFcount_01032019.txt0000777017777601777760116201541013436157760015052 0ustar nfsnobodynfsnobody▒▒FIELD1 or PFcount_01032019.txt0000777017777601777760116201541013436157760015052 0ustar nfsnobodynfsnobodyÿþFIELD1 (whichever it is the true one), so I can't extract \1 from the regex.

Shouldn't . match every character? Why it does not match with whatever come before FIELD1?

0

There are 0 best solutions below