In paragraph making the first letter of every sentence uppercase?

1.2k Views Asked by At

I got this function from php.net for convert uppercase become lowercase in sentence case.

function sentence_case($string) {
    $sentences = preg_split('/([.?!]+)/', $string, -1, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
    $new_string = '';
    foreach ($sentences as $key => $sentence) {
        $new_string .= ($key & 1) == 0
            ? ucfirst(strtolower(trim($sentence)))
            : $sentence . ' ';
    }
    return trim($new_string);
}

If the sentence is not in the paragraph, all works well. But if the sentence is in the paragraph, the first letter in opening paragraph (<p>) or break (<br>) tag HTML become lowercase.

This is the sample:

Before:

<p>Lorem IPSUM is simply dummy text. LOREM ipsum is simply dummy text! wHAt is LOREM IPSUM? Hello lorem ipSUM!</p>

Output:

<p>lorem ipsum is simply dummy text. Lorem ipsum is simply dummy text! What is lorem ipsum? Hello lorem ipsum!</p>

Can someone help me to make the first letter in the paragraph become capital letter?

4

There are 4 best solutions below

0
On BEST ANSWER

Your problem is that you're considering HTML within the sentence, so the first "word" of the sentence is <P>lorem, not Lorem.

You can change the regexp to read /([>.?!]+)/, but this way you'll see extra spaces before "Lorem" as the system now sees two sentences and not one.

Also, now Hello <em>there</em> will be considered as four sentences.

This looks disturbingly like a case of "How can I use regexp to interpret (X)HTML"?

2
On

You can do it with CSS easily

p::first-letter {
    text-transform: uppercase;
}
0
On

try this

function html_ucfirst($s) {
return preg_replace_callback('#^((<(.+?)>)*)(.*?)$#', function ($c) {
        return $c[1].ucfirst(array_pop($c));
 }, $s);
}

and call this function

$string= "<p>Lorem IPSUM is simply dummy text. LOREM ipsum is simply dummy text! wHAt is LOREM IPSUM? Hello lorem ipSUM!</p>";
echo html_ucfirst($string);

here is working demo : https://ideone.com/fNq3Vo

0
On

When parsing valid html, it is best practice to leverage a legitimate DOM parser. Using regex is not reliable because regex does not know the difference between a tag and a substring that resembles a tag.

Code: (Demo)

$html = <<<HTML
<p>Lorem IPSUM is simply dummy text.<br>Here is dummy text. LOREM ipsum is simply dummy text! wHAt is LOREM IPSUM? Hello lorem ipSUM!</p>
HTML;

libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//text()') as $textNode) {
    $textNode->nodeValue = preg_replace_callback(
        '/(?:^|[.!?]) *\K[a-z]+/',
        function($m) {
            return ucfirst($m[0]);
        },
        strtolower($textNode->nodeValue)
    );
}
echo $dom->saveHTML();

Output:

<p>Lorem ipsum is simply dummy text.<br>Here is dummy text. Lorem ipsum is simply dummy text! What is lorem ipsum? Hello lorem ipsum!</p>

The above snippet does not:

  1. allow acronyms to remain all-caps (because the OP wants to convert all letters to lowercase before making select letters uppercase)
  2. does not bother to properly handle multibyte character (because the OP does not indicate this necessity)
  3. does not know the difference between a mid-sentence dot and a sentence-ending dot (due to ambiguity in English punctuation)