Add a space between dot or comma and a letter but with some exceptions

87 Views Asked by At

I need to do some grammar validation, for example add spaces after dots. Problem is that it shouldn't be done everywhere like in e.g. or www.example.co. or some more advanced exceptions like 999.77.SA.

My idea was to use preg_replace() and this one works perfectly for everything, but there's no place for exceptions.

//add space after a dot 
$string = preg_replace('/(?<=[.,])(?=\p{L}+)/u', ' ',$string);

I could try adding those exceptions to the regexp itself, but we have many of those and the expression would be terribly complicated.

I tried also with preg_match() or preg_match_callback(), but the match array returns only empty strings so that doesn't help.

Example text

Hello.This is my example.In some cases space shouldn't be added. Like in e.g. or www.example.com or 88.ASD

Should be changed to:

Hello. This is my example. In some cases space shouldn't be added. Like in e.g. or www.example.com or 88.ASD

Do you have some idea how to do this the cleanest way?

1

There are 1 best solutions below

0
markalex On

This task is not solvable in general way with simple regex (or even very complicated one).

Problem is that without semantic understanding of sentence there is no way to classify if . is part of abbreviation or separator of sentences.

Take for example something similar to your example: 77.I: depending on the context it could be abbreviation position with code 77.I named "Other". Or Parts of two sentences. Elm street 77.I leave there.


Also, consider that . may have additional meanings in other languages. For example, in Latvian 2.A means "second A", and while space there might be acceptable, there is no clear reason for it to appear.