sed substitution dropping chunks of text

98 Views Asked by At

I wanted to use SED to find and replace a small string of text within a number of files.

Specifically the substitution I want to perform is:

sed -e '/35=R/s/|131=.*|/|131=$UNIQUE|/g' $f

Which is running within a bash script where $f is the filename.

The sed searches for lines which contain the string 35=R and then has a very simple expression to replaces |131=.*| (anything after the |131=) with |131=$UNIQUE|.

This seems to work perfectly on some files however in other cases:

Eg working example:

Before:

8=FIX.4.2|9=151|35=R|56=ABC|142=7848|50=STUFF|49=OTHERSTUFF|52=20250905-06:00:10.910|34=107|146=1|55=DE123|22=4|48=DE123|38=1|54=1|207=F|131=12ABC|10=243

After:

8=FIX.4.2|9=151|35=R|56=COBA|142=7848|50=STUFF|49=OTHERSTUFF|52=20250905-06:00:10.910|34=107|146=1|55=DE123|22=4|48=DE123|38=1|54=1|207=F|131=$UNIQUE|10=243

However in other cases it seems to output with large blocks of text missing.

Example not working:

Before:

8=FIX.4.2|9=147|35=R|34=15301|49=STUFF|52=20190905-15:27:54.305|56=OTHERSTUFF|115=STUFFY|131=1234abc|146=1|55=AB123|15=ZYX|22=4|38=1|48=AB123|54=2|207=STUFF|10=253

After:

8=FIX.4.2|9=147|35=R|34=15301|49=STUFF|52=20190905-15:27:54.305|56=OTHERSTUFF|115=STUFFY|131=$UNIQUE|10=253

As you can see its missing everything following the pipe after 131=$UNIQUE. I'm fairly new to expressions and sed so its possible I'm misunderstanding the substitution part. Any pointers would be hugely appreciated.

Thank you.

3

There are 3 best solutions below

0
Cyrus On

Replace .* with [^|]* to stop .* before first |.

5
Spencer On

You were (un)lucky with your first example, because there weren't any | characters after the division with 131= in it.

The problem here is that .* matches any sequence of characters, including any vertical bar (|) characters. So you need to exclude | from what you're matching. So, instead of .* use [^|]*

Also, | can have a special meaning, so you might need to escape it (\|) when it's not in brackets.

But even then, you're not out of the woods. The 131= division can apparently move around on the line. Meaning, it might be first, or it might be last. You can accommodate it being last by just eliminating the closing |:

sed -e '/35=R/s/|131=[^|]*/|131=$UNIQUE/g' $f

(I tested this with Visual Studio search and replace, because it's handy, and sed isn't. But it did what you wanted.)

To take the case where the 131= division might be the first one on the line, you need to add another expression:

sed -e '/35=R/s/|131=[^|]*/|131=$UNIQUE/g' -e '/35=R/s/^131=[^|]*/131=$UNIQUE/g' $f
2
oneastok On

The .* expression is “greedy”. That means that it will try to catch as many characters as possible. In the examples, it goes to the rightmost | symbol. You should use this expression:

sed -e '/35=R/s/|131=[^|]*|/|131=$UNIQUE|/g' $f