Seperate hundreds of English and Chinese vocabularies to a table

41 Views Asked by At

I'm trying to use python to separate an entire text file that is something like this:

Above Ground 地面上 AG Abutment 橋臺 ABUT Acceptance Quality Level 可接受品質水準 AQL Acoustical 隔音 ACOUS Adit 隧道橫坑 Advanced Shoring Method 支撐先進工法 

So the format is {Chinese word} _ {English word}_.

Ex: ABUT Acceptance Quality Level 可接受品質水準

(_ is for blank)

I want to conver them to a table with one row of English and another row of Chinese Result Image

Is there any way to do it with numpy or regex? Or are there some other easy alternatives (like excel)? Thank you!

Edit: I tried with this code, but there's a traceback.

import re

path = 'wordlist.txt'
file = open(path, "r")
result = re.match(r"([\u4e00-\u9fa5]*)([A-Za-z\s]*)", file)
print(result.group(1)) 
print(result.group(2)) 
Traceback (most recent call last):
  File "c:\Users\...\Desktop\web dev\parsetextforQuizlet\sepretatchiEng.py", line 5, in <module>
    result = re.match(r"([\u4e00-\u9fa5]*)([A-Za-z\s]*)", file)
  File "C:\Users\...\AppData\Local\Programs\Python\Python39\lib\re.py", line 191, in match
    return _compile(pattern, flags).match(string)
TypeError: expected string or bytes-like object
0

There are 0 best solutions below