Regex to add space between unicode words/numbers in python

99 Views Asked by Sharan Iyer At 12 March 2021 at 01:11

I tried using the basic regex for unicodes but I am not able to make them work on the string with characters other than the traditional A-Z and numbers

I am looking at examples from multiple languages not part of the A-Z Alphabetical family

text = "20किटल"
res = re.sub("^[^\W\d_]+$", lambda ele: " " + ele[0] + " ", text)

Output:
20किटल

2nd try:

regexp1 = re.compile('^[^\W\d_]+$', re.IGNORECASE | re.UNICODE)
regexp1.sub("^[^\W\d_]+$", lambda ele: " " + ele[0] + " ", text)

 Output:
 20किटल


Expected output:
**20 किटल**

Original Q&A

There are 2 best solutions below

Toto On 12 March 2021 at 10:44 BEST ANSWER

Use Pypi regex library

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import regex

text = "20किटल"
pat = regex.compile(r"(?<=\d)(?=\p{L})", re.UNICODE)
res = pat.sub(" ", text)
print res

Where \p{L} stand for any letter in any language

Output:

20 किटल

tshiono On 12 March 2021 at 01:54

If I'm understanding your requirements correctly, would you try the following:

# -*- coding: utf-8 -*-

import re

text = '20किटल'
print(re.sub(r'([0-9a-zA-Z_]+)([^\s0-9a-zA-Z_]+)', r'\1 \2', text))

Output:

20 किटल

Regex to add space between unicode words/numbers in python

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in REGEX

Related Questions in UNICODE

Related Questions in NLP

Related Questions in INDIC

Trending Questions

Popular # Hahtags

Popular Questions