Removal of special characters from string using perl script

755 Views Asked by At

I have a string like below

stringinput = Sweééééôden@

I want to get output like

stringoutput = Sweden

the spl characters ééééô and @ has to be removed.

Am using

$stringoutput = `echo $stringinput | sed 's/[^a-z  A-Z 0-9]//g'`;

I am getting result like Sweééééôden but ééééô is not getting removed.

Can you please suggest what I have to add

2

There are 2 best solutions below

3
On BEST ANSWER

You need to use LC_ALL=C before sed command to make [A-Za-z] character class create ranges as per ASCII table:

stringoutput=$(echo $stringinput | LC_ALL=C sed 's/[^A-Za-z0-9]//g')

See the online demo:

stringinput='Sweééééôden@';
stringoutput=$(echo $stringinput | LC_ALL=C sed 's/[^A-Za-z0-9]//g');
echo "$stringoutput";
# => Sweden

See POSIX regex reference:

In the default C locale, the sorting sequence is the native character order; for example, ‘[a-d]’ is equivalent to ‘[abcd]’. In other locales, the sorting sequence is not specified, and ‘[a-d]’ might be equivalent to ‘[abcd]’ or to ‘[aBbCcDd]’, or it might fail to match any character, or the set of characters that it matches might even be erratic. To obtain the traditional interpretation of bracket expressions, you can use the ‘C’ locale by setting the LC_ALL environment variable to the value ‘C’.

In Perl, you could simply use

my $stringinput = 'Sweééééôden@';
my $stringoutput = $stringinput =~ s/[^A-Za-z0-9]+//gr;
print $stringoutput;

See this online demo.

1
On

No need to call sed from Perl, perl can do the substitution itself. It's also faster, as you don't need to start a new process.

#!/usr/bin/perl
use warnings;
use strict;
use utf8;

my $string = 'Sweééééôden@';
$string =~ s/[^A-Za-z0-9]//g;
print $string;