How to prevent zalgo text using php

2.2k Views Asked by At

I have some problems with Zalgo on my imageboard.

Texts like below mess up my imageboard. Is there a way to prevent these characters and "fix" or clean up the texts?

Example text Source:

ALL IS LOŚ͖̩͇̗̪̏̈́T ALL I​S LOST the pon̷y he comes he c̶̮omes he comes the ich​or permeates all MY FACE MY FACE ᵒh god no NO NOO̼O​O NΘ stop the an​*̶͑̾̾​̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e n​ot rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ

I tried to use this solution:

$cleanMessage = preg_replace("/[^\x20-\xAD\x7F]/", "", $input_lines);

Taken from here: Remove special characters that mess with formating But it works only for latin chars Can anyone help me?

1

There are 1 best solutions below

5
aftamat4ik On BEST ANSWER

This regular expression replaces every superscript symbol in the $text variable:

$text = preg_replace("~[\p{M}]~uis","", $text);

If $text contains char with superscript, for example กิ this regex will remove that superscript symbol and result $text will contain just .

I was improved this regex and changed it to filter only second level of phonetic marks

$text = preg_replace("~(?:[\p{M}]{1})([\p{M}])+?~uis","", $text);

This regex will filter only second level of superscript symbols. Use it if you want to filter deutch or other languages with reserved marks. This regex will transform this word -

͐̈ͩ̎Zͮ͌ͦ͆ͦͤÃ̉͛̄ͭ̈̚LͫG̉̋͂̉Oͨ͌̋͗!

into this: ZÄLͫGO!

I hope second regex will help you.