I have a larger Python 3 program that processes OCR outputs and some bubble detection and I have it mostly worked out. I have one function that I got off Stack Overflow that works but has a weird side effect and since I do not understand the code very well I would like to get a little help coming up with something that works as I would like.
Here is the code I am using now: Link
How it works: I have a text file we can call address.txt that looks like this:
First Name,
Address,
City State Zip,
Second Name,
Second Address,
Second City State zip,
I would like to convert that to this:
First Name, Address, City State Zip,
Second Name, Second Address, City State Zip,
Ideally I would have it write to address.txt in the format I want to start, rather then create the file and have to edit the file afterwards using the above function I picked up from stack overflow. Here is my function that reads the images creates the file and adds commas at the end of each line. If I could get it to line up every three lines in one line I would not need the above code at all.
def tess_address():
files = os.listdir("address")
sorted_files = sorted(files)
for image in sorted_files:
# read image
output = "address/" + image
# Pass the image through pytesseract
text = pytesseract.image_to_string(output)
#remove all commas
no_comma_text = re.sub(",", "", text)
for line in no_comma_text.splitlines():
#print to file
print(line + ",", file=open("address" + '.txt', 'a', encoding='utf8'))
Python can make reading an consistent number of lines per logical grouping quite easy.
Start by reading the whole file line-by-line, taking care to strip away the trailing linebreak; you can also replace the extraneous commas:
and lines now looks like:
You can now do a simple assertion to make sure you have groups of three lines:
Then create a loop that increments an index three-at-a-time to turn each chunk of three lines into three fields in a (CSV) row:
Finally, use the csv module to write those rows to a new CSV file:
All together, without the print statements:
Following Adesoji_Alu's suggestion, you can skip the intermdediate file and process the
textvariable directly: