In Perl, there's the ucfirst function.
Is it this the equivalent to this:
sub uppercase {
my ($W) = @_;
$$W = uc(substr($$W,0,1)).substr($$W,1);
}
Does it matter across Perl version?
Contextualizing the question, https://github.com/moses-smt/mosesdecoder/pull/206/files#diff-876e51db2a1ab71c1ae736182d1e5e04R63 ,
Previously, the usage of uppercase is as such:
sub process {
my $line = $_[0];
chomp($line);
$line =~ s/^\s+//;
$line =~ s/\s+$//;
my @WORD = split(/\s+/,$line);
# uppercase at sentence start
my $sentence_start = 1;
for(my $i=0;$i<scalar(@WORD);$i++) {
&uppercase(\$WORD[$i]) if $sentence_start;
if (defined($SENTENCE_END{ $WORD[$i] })) { $sentence_start = 1; }
elsif (!defined($DELAYED_SENTENCE_START{$WORD[$i] })) { $sentence_start = 0; }
}
# uppercase headlines {
if (defined($SRC) && $HEADLINE[$sentence]) {
foreach (@WORD) {
&uppercase(\$_) unless $ALWAYS_LOWER{$_};
}
}
But it seems like replacing &uppercase(\$WORD[$i]) and &uppercase(\$_) with ucfirst(\$WORD[$i]) and ucfirst(\$_) is different.
ucfirstis not equivalent to the following:ucfirstis mostly[1] equivalent to the following:If you wanted to rewrite
uppercasein terms ofucfirst, it would look like this:That means that if you wanted to eliminate
uppercaseentirely, you'd replacewith
You tried using
ucfirstactually does a better job of handling more esoteric characters such as U+01F3 LATIN SMALL LETTER DZ ("dz").