How to go through a txt file where rows do not have the same number of values

85 Views Asked by At

I recently started to use Python and now I have a problem similar to the following. I have a txt file where rows do not have the same number of values:

    who you gonna call 555 2368
    56 20 9 7 8
    0 9 7 -789 -9
    -19 -14 0 9 0
    0 -1 0 9 0
   -4.0 -4.1 -4.2 -4.3 -4.4 
   -5.0 -5.1 -5.2 -5.3 -5.4 
   -6.0 -6.1 -6.2 -6.3 -6.4 
   -7.0 -7.1 -7.2 -7.3 -7.4 
  
   +1.0 +1.1 +1.2 +1.3 +1.4
   +2.0 +2.1 +2.2 +2.3 +2.4
   +3.0 +3.1 +3.2 +3.3 +3.4 
   +4.0 +4.1 +4.2 +4.3 +4.4
 
   -6.0 -6.1 -6.2 -6.3 -6.4 
   -7.0 -7.1 -7.2 -7.3 -7.4
   -8.0 -8.1 -8.2 -8.3 -8.4
   -9.0 -9.1 -9.2 -9.3 -9.4

For the first lines, I need to read only some values, while for the others (the blocks), I need to read them separately, block by block. So far I wrote something like this:

import tkinter as tk
from tkinter import filedialog
import numpy as np
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
f = open(file_path)
with open(file_path) as fp:
    first_line = fp.readline()
    phone_number = first_line[0:28]

    second_line = fp.readline()
    nums2 = second_line.split(' ')
    total_elements = (float(nums2[2]))

    first_block = []
    first_block_lines = range(5,int(5+(total_elements/5)))
    for position, line in enumerate(f):
        if position in first_block_lines:
            first_block.append(line)

    second_block = []
    second_block_lines = range(int(5+(total_elements/5)+1), int(5+(total_elements/5)+1+total_elements/5))
    for position, line in enumerate(f):
        if position in second_block_lines:
            second_block.append(line)

In my real case, I have hundreds blocks, all of them with the same number of rows and columns. The way how I go through the blocks (range(...)) is terrible and I was wondering how I can make it smarter. Please, let me know if I have been unclear. Many thanks.

1

There are 1 best solutions below

1
On

Since you know when the blocks start and the blocks are separated by empty lines, you can just loop through the file and append each line to a list. When the empty line is found, append the line list to the main block list.

Try this code:

ss = '''
who you gonna call 555 2368
56 20 9 7 8
0 9 7 -789 -9
-19 -14 0 9 0
0 -1 0 9 0
-4.0 -4.1 -4.2 -4.3 -4.4 
-5.0 -5.1 -5.2 -5.3 -5.4 
-6.0 -6.1 -6.2 -6.3 -6.4 
-7.0 -7.1 -7.2 -7.3 -7.4 

+1.0 +1.1 +1.2 +1.3 +1.4
+2.0 +2.1 +2.2 +2.3 +2.4
+3.0 +3.1 +3.2 +3.3 +3.4 
+4.0 +4.1 +4.2 +4.3 +4.4

-6.0 -6.1 -6.2 -6.3 -6.4 
-7.0 -7.1 -7.2 -7.3 -7.4
-8.0 -8.1 -8.2 -8.3 -8.4
-9.0 -9.1 -9.2 -9.3 -9.4
'''.strip()

with open ('data.txt', 'w') as f: f.write(ss)  # write test file

####################################


with open('data.txt') as fp:
    first_line = fp.readline()
    phone_number = first_line[0:28]

    second_line = fp.readline()
    nums2 = second_line.split(' ')
    total_elements = (float(nums2[2]))

# read blocks
blocks = []  # list of lists of lines
startline=5
blockcnt=0
lst = []  # one block
with open('data.txt') as fp:
    for x in range(startline): fp.readline() # before blocks
    while True:  # until end of file
        ln = fp.readline()
        if not ln: break  # end of file
        ln = ln.strip() # strip \n
        if ln.strip() == "":  # found space between blocks
            blocks.append(lst)  # add to main block list
            lst = []  # reset for next block
        else:
            lst.append(ln) # add line to block
blocks.append(lst)  # last block

for b in blocks:  # each block
    print(b)

Output

['-4.0 -4.1 -4.2 -4.3 -4.4', '-5.0 -5.1 -5.2 -5.3 -5.4', '-6.0 -6.1 -6.2 -6.3 -6.4', '-7.0 -7.1 -7.2 -7.3 -7.4']
['+1.0 +1.1 +1.2 +1.3 +1.4', '+2.0 +2.1 +2.2 +2.3 +2.4', '+3.0 +3.1 +3.2 +3.3 +3.4', '+4.0 +4.1 +4.2 +4.3 +4.4']
['-6.0 -6.1 -6.2 -6.3 -6.4', '-7.0 -7.1 -7.2 -7.3 -7.4', '-8.0 -8.1 -8.2 -8.3 -8.4', '-9.0 -9.1 -9.2 -9.3 -9.4']

If you want to go one step further and convert the values to floats, use this code to append each line:

lst.append([float(f) for f in ln.split(' ')]) # add line to block

Output

[[-4.0, -4.1, -4.2, -4.3, -4.4], [-5.0, -5.1, -5.2, -5.3, -5.4], [-6.0, -6.1, -6.2, -6.3, -6.4], [-7.0, -7.1, -7.2, -7.3, -7.4]]
[[1.0, 1.1, 1.2, 1.3, 1.4], [2.0, 2.1, 2.2, 2.3, 2.4], [3.0, 3.1, 3.2, 3.3, 3.4], [4.0, 4.1, 4.2, 4.3, 4.4]]
[[-6.0, -6.1, -6.2, -6.3, -6.4], [-7.0, -7.1, -7.2, -7.3, -7.4], [-8.0, -8.1, -8.2, -8.3, -8.4], [-9.0, -9.1, -9.2, -9.3, -9.4]]

If you want to name the blocks, you can convert the list to a dictionary:

# convert to dictionary
dd = {'block_' + str(i):lst for i,lst in enumerate(blocks)}
print(dd)

Output

{'block_0': [[-4.0, -4.1, -4.2, -4.3, -4.4], [-5.0, -5.1, -5.2, -5.3, -5.4], [-6.0, -6.1, -6.2, -6.3, -6.4], [-7.0, -7.1, -7.2, -7.3, -7.4]], 
 'block_1': [[1.0, 1.1, 1.2, 1.3, 1.4], [2.0, 2.1, 2.2, 2.3, 2.4], [3.0, 3.1, 3.2, 3.3, 3.4], [4.0, 4.1, 4.2, 4.3, 4.4]], 
 'block_2': [[-6.0, -6.1, -6.2, -6.3, -6.4], [-7.0, -7.1, -7.2, -7.3, -7.4], [-8.0, -8.1, -8.2, -8.3, -8.4], [-9.0, -9.1, -9.2, -9.3, -9.4]]}