In Perl, there's the ucfirst function.
Is it this the equivalent to this:
sub uppercase {
my ($W) = @_;
$$W = uc(substr($$W,0,1)).substr($$W,1);
}
Does it matter across Perl version?
Contextualizing the question, https://github.com/moses-smt/mosesdecoder/pull/206/files#diff-876e51db2a1ab71c1ae736182d1e5e04R63 ,
Previously, the usage of uppercase
is as such:
sub process {
my $line = $_[0];
chomp($line);
$line =~ s/^\s+//;
$line =~ s/\s+$//;
my @WORD = split(/\s+/,$line);
# uppercase at sentence start
my $sentence_start = 1;
for(my $i=0;$i<scalar(@WORD);$i++) {
&uppercase(\$WORD[$i]) if $sentence_start;
if (defined($SENTENCE_END{ $WORD[$i] })) { $sentence_start = 1; }
elsif (!defined($DELAYED_SENTENCE_START{$WORD[$i] })) { $sentence_start = 0; }
}
# uppercase headlines {
if (defined($SRC) && $HEADLINE[$sentence]) {
foreach (@WORD) {
&uppercase(\$_) unless $ALWAYS_LOWER{$_};
}
}
But it seems like replacing &uppercase(\$WORD[$i])
and &uppercase(\$_)
with ucfirst(\$WORD[$i])
and ucfirst(\$_)
is different.
ucfirst
is not equivalent to the following:ucfirst
is mostly[1] equivalent to the following:If you wanted to rewrite
uppercase
in terms ofucfirst
, it would look like this:That means that if you wanted to eliminate
uppercase
entirely, you'd replacewith
You tried using
ucfirst
actually does a better job of handling more esoteric characters such as U+01F3 LATIN SMALL LETTER DZ ("dz").