Truncating a slug to maximum length by removing delimited words from the middle of string

258 Views Asked by At

I am trying to ensure that a slug formatted string is within the total character limit by removing words from the middle of the string if warranted.

Sample slug:

'/job/hello-this-is-my-job-posting-for-a-daycare-im-looking-for-in-91770-rosemead-california-12345'

The string will always start with /job/ and end with in-zipcode-city-state-job_id. However, there is a 150 character limit to the slug and I am looking to truncate words before the zipcode one at a time so this character limit is not exceeded. I know I have to use regex/explode, but how can I do this? I tried the following but my matches array seems to have too many elements.

$pattern = '/-in-\d{5}-(.*)-(.*)-(\d*)/';
$string = '/job/hello-this-is-my-job-posting-for-a-daycare-im-looking-for-in-91770-rosemead-california-12345';

preg_match($pattern, $string, $matches);
print_r($matches);

// Array
(
    [0] => -in-91770-rosemead-california-12345
    [1] => rosemead
    [2] => california
    [3] => 12345
)

Why is rosemead, california, 12345 considered matches? Shouldn't there only be the first element?

How do I ensure that the complete slug will be a maximum of 150 characters long, with the trailing part (location) included in its entirety, and the leading part (job name) truncated if necessary?

3

There are 3 best solutions below

8
Rob Eyre On

You can do this without using explode() and iterating, just with some standard string manipulation:

$pattern = '/-in-\d{5}-.*-.*-\d*/';
$string = '/job/hello-this-is-my-job-posting-for-a-daycare-im-looking-for-in-91770-rosemead-california-12345';
$matches = [];

if (!preg_match($pattern, $string, $matches)) {
    // mismatched string - error handling here
}

$totalLength = 150;
$maxPrefixLength = $totalLength - strlen($matches[0]);
if ($maxPrefixLength < strlen('/job/')) {
    // no prefix words possible at all - error handling here
}
$prefixLength = max(strlen('/job/'), strrpos(substr($string, 0, $maxPrefixLength), '-'));
$slug = substr($string, 0, $prefixLength) . $matches[0];
0
Markus AO On

Trimming the leading part of a URL slug to a specified length can be accomplished in a number of ways, some more complicated than others. Here's a flexible utility function with informative comments. We use a regex that extracts the leading part (job name) and the trailing part (location) as the starting point. Then, the maximum allowed length for the job name is counted based on total allowed length, minus location slug length. See the comments for more insight.

function trim_slug(string $slug, int $maxlen = 150): string
{
    // check if trimming is required:
    if(strlen($slug) <= $maxlen) {
        return $slug; 
    }
    
    $pattern = '/^(?<job>.+)(?<loc>-in-\d{5}-.*-.*-\d*)$/';
    // $match will have 'job' and 'loc' named keys with the matched values
    preg_match($pattern, $slug, $match);
    
    // raw cut of job name to maximum length:
    $max_job_chars = $maxlen - strlen($match['loc']);
    $job_name = substr($match['job'], 0, $max_job_chars);
    
    // tidy up to last delimiter, if exists, instead of mincing words:
    if($last_delim = strrpos($job_name, '-')) {
        $job_name = substr($match['job'], 0, $last_delim);      
    }
    
    return $job_name . $match['loc'];
}

$string = '/job/hello-this-is-my-job-posting-for-a-daycare-im-looking-for-in-91770-rosemead-california-12345';

echo trim_slug($string, 80);
// result: /job/hello-this-is-my-job-posting-for-a-in-91770-rosemead-california-12345

In the usage sample, max length is 80, since your sample string is only 97 chars, and as such would return from the function as-is with the default 150 characters limit. Demo at 3v4l.

Note that this answer uses PHP standard string functions that are not multibyte-aware. If you expect multibyte content, you should use the corresponding multibyte string functions to avoid mangled data. (Whether you want multibyte chars in a URL slug to begin with, and what are the best ways of handling that, is the topic for another question.)

0
mickmackusa On
  1. Parse the input slug into its 3 crucial components,
  2. Caluculate the number of characters permitted in the middle portion by subtracting the first and third lengths from the total allowance,
  3. Truncate the middle portion (cleanly) by finding the latest occurring hyphen before reaching the character limit, then remove the remaining expendable substring.

This leaves you with a string optimized for greatest length without damaging whole words in the slug.

Code: (Demo)

$slug = '/job/hello-this-is-my-job-posting-for-a-daycare-im-looking-for-in-91770-rosemead-california-12345';

$slugLimit = 70;

echo preg_replace_callback(
         '~^(/job/)((?:[^-]*-)*)(in-\d{5}-[^-]*-[^-]*-\d*)$~u',
         fn($m) => implode([
             $m[1],
             preg_replace(
                 '~^.{0,' . ($slugLimit - mb_strlen($m[1] . $m[3]) - 1) . '}-\K.*~u',
                 '',
                 $m[2]
             ),
             $m[3]
         ]),
         $slug
     );

Output slug's total length 68 characters:

/job/hello-this-is-my-job-posting-in-91770-rosemead-california-12345


Or combine the first and second components for simpler processing: (Demo)

echo preg_replace_callback(
         '~^((?:[^-]*-)*)(in-\d{5}-[^-]*-[^-]*-\d*)$~u',
         fn($m) => implode([
             preg_replace(
                 '~^.{0,' . ($slugLimit - mb_strlen($m[2]) - 1) . '}-\K.*~u',
                 '',
                 $m[1]
             ),
             $m[2]
         ]),
         $slug
     );

Finally, the most compact version that I can think of uses a capture group inside of a lookahead so that the full string match is replaced in the callback. Demo

echo preg_replace_callback(
         '~^(?:[^-]*-)*(?=(in-\d{5}-[^-]*-[^-]*-\d*)$)~u',
         fn($m) => preg_replace(
             '~^.{0,' . ($slugLimit - mb_strlen($m[1]) - 1) . '}-\K.*~u',
             '',
             $m[0]
         ),
         $slug
     );

If you check mb_strlen() of the new slug and it is still over the limit, then you should throw an exception or inform the user of the violation.