I'm trying to use python to separate an entire text file that is something like this:
Above Ground 地面上 AG Abutment 橋臺 ABUT Acceptance Quality Level 可接受品質水準 AQL Acoustical 隔音 ACOUS Adit 隧道橫坑 Advanced Shoring Method 支撐先進工法
So the format is {Chinese word} _ {English word}_.
Ex: ABUT Acceptance Quality Level 可接受品質水準
(_ is for blank)
I want to conver them to a table with one row of English and another row of Chinese Result Image
Is there any way to do it with numpy or regex? Or are there some other easy alternatives (like excel)? Thank you!
Edit: I tried with this code, but there's a traceback.
import re
path = 'wordlist.txt'
file = open(path, "r")
result = re.match(r"([\u4e00-\u9fa5]*)([A-Za-z\s]*)", file)
print(result.group(1))
print(result.group(2))
Traceback (most recent call last):
File "c:\Users\...\Desktop\web dev\parsetextforQuizlet\sepretatchiEng.py", line 5, in <module>
result = re.match(r"([\u4e00-\u9fa5]*)([A-Za-z\s]*)", file)
File "C:\Users\...\AppData\Local\Programs\Python\Python39\lib\re.py", line 191, in match
return _compile(pattern, flags).match(string)
TypeError: expected string or bytes-like object