Sed replace string on alphanumeric with certain length that must contain one capitalized letter and one number

417 Views Asked by At

I want to do a string replacement on any string that is surrounded by a word boundary that is alphanumeric and is 14 characters long. The string must contain at least one capitalized letter and one number. I know (I think I know) that I'll need to use positive look ahead for the capitalized letter and number. I am sure that I have the right regex pattern. What I don't understand is why sed is not matching. I have used online tools to validate the pattern like regexpal etc. Within those tools, I am matching the string like I expect.

Here is the regex and sed command I'm using.

\b(?=.*[A-Z])(?=.*[0-9])[a-zA-Z0-9]{14}\b

The sed command I'm testing with is

echo "asdfASDF1234ds" | sed 's/\b(?=.*[A-Z])(?=.*[0-9])[a-zA-Z0-9]{14}\b/NEW_STRING/g'

I would expect this to match on the echoed string.

3

There are 3 best solutions below

5
tripleee On BEST ANSWER

sed doesn't support lookaheads, or many many many other modern regex Perlisms. The simple fix is to use Perl.

perl -pe 's/\b(?=.*[A-Z])(?=.*[0-9])[a-zA-Z0-9]{14}\b/NEW_STRING/g' <<< "asdfASDF1234ds"
1
jhnc On

sed understands a very limited form of regex. It does not have lookahead.

Using a tool with more powerful regex support is the simple solution.

If you must use sed, you could do something like:

$ sed '
    # mark delimiters
    s/[^a-zA-Z0-9]\{1,\}/\n&\n/g
    s/^[^\n]/\n&/
    s/[^\n]$/&\n/

    # mark 14-character candidates
    s/\n[a-zA-Z0-9]\{14\}\n/\n&\n/g

    # mark if candidate contains capital
    s/\n\n[^\n]*[A-Z][^\n]*\n\n/\n&\n/g

    # check for a digit; if found, replace
    s/\n\n\n[^\n]*[0-9][^\n]*\n\n\n/NEW_STRING/g

    # remove marks
    s/\n//g
' <<'EOD'
a234567890123n
,a234567890123n,
xx,a234567890123n,yy
a23456789A123n
XX,a23456789A123n,YY
xx,a23456789A1234n,yy
EOD
a234567890123n
,a234567890123n,
xx,a234567890123n,yy
NEW_STRING
XX,NEW_STRING,YY
xx,a23456789A1234n,yy
$
4
potong On

This might work for you (GNU sed):

sed -E 's/\<[A-Za-z0-9]{14}\>/\n&\n/
        s/\n.*(([A-Z].*[0-9])|([0-9].*[A-Z])).*\n/NEW_STRING/
        s/\n//g' file    

Isolate a 14 alphanumeric word by delimiting it with newlines.

If the string between the newlines contains at least one uppercase alpha character and at least one numeric character, replace the string and its delimiters by NEW_STRING.

Remove the delimiters.

Or if multiple strings, perhaps:

sed -E 's/\b/\n/g
        s#.*#echo "&"|sed -E "/^[a-z0-9]{14}$/I{/[A-Z]/{/[0-9]/s/.*/NEW_STRING/}}"#e
        s/\n//g' file