Regex to add space between unicode words/numbers in python

91 Views Asked by At

I tried using the basic regex for unicodes but I am not able to make them work on the string with characters other than the traditional A-Z and numbers

I am looking at examples from multiple languages not part of the A-Z Alphabetical family

text = "20किटल"
res = re.sub("^[^\W\d_]+$", lambda ele: " " + ele[0] + " ", text)

Output:
20किटल

2nd try:

regexp1 = re.compile('^[^\W\d_]+$', re.IGNORECASE | re.UNICODE)
regexp1.sub("^[^\W\d_]+$", lambda ele: " " + ele[0] + " ", text)

 Output:
 20किटल


Expected output:
**20 किटल**
2

There are 2 best solutions below

0
On BEST ANSWER

Use Pypi regex library

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import regex

text = "20किटल"
pat = regex.compile(r"(?<=\d)(?=\p{L})", re.UNICODE)
res = pat.sub(" ", text)
print res

Where \p{L} stand for any letter in any language

Output:

20 किटल
0
On

If I'm understanding your requirements correctly, would you try the following:

# -*- coding: utf-8 -*-

import re

text = '20किटल'
print(re.sub(r'([0-9a-zA-Z_]+)([^\s0-9a-zA-Z_]+)', r'\1 \2', text))

Output:

20 किटल