What is the syntax for writing txt file with multiple numpy arrays+scalars and how to read it in again?

187 Views Asked by At

I have 2 numpy arrays of same length lets call them A and B and 2 scalar values named C and D. I want to store these values into a single txt file. I thought of the following structure:

enter image description here

It doesnt have to have this format I just thought its convenient and clear. I know how to write a the numpy arrays into a txt file and read them out again, but I struggle how to write the txt file as a combination of arrays and scalar values and how to read them out again from txt to numpy.

A = np.array([1, 2, 3, 4, 5])
B = np.array([5, 4, 3, 2, 1])
C = [6]
D = [7]
np.savetxt('file.txt', (A, B))
A_B_load = np.loadtxt('file.txt')
A_load = A_B_load[0,:]
B_load= A_B_load[1,:]

This doesnt give me the same column structure that I proposed but stores the arrays in rows but that doesnt really matter.

I found one solution which is a bit unhandy since I have to fill up the scalar values with 0 for them to become of the same length like the arrays A and B there must be a smarter solution.

    A = np.array([1, 2, 3, 4, 5])
    B = np.array([5, 4, 3, 2, 1])
    C = [6]
    D = [7]
    fill = np.zeros(len(A)-1)
    C = np.concatenate((C,fill))
    D = np.concatenate((D, fill))
    np.savetxt('file.txt', (A,B,C,D))
    A_B_load = np.loadtxt('file.txt')
    A_load = A_B_load[0,:]
    B_load = A_B_load[1,:]
    C_load = A_B_load[2,0]
    D_load = A_B_load[3,0]
2

There are 2 best solutions below

0
On BEST ANSWER
In [123]: A = np.array([1, 2, 3, 4, 5])
     ...: B = np.array([5, 4, 3, 2, 1])
     ...: C = [6]
     ...: D = [7]

savetxt is designed to write a 2d array in a consistent csv form - a neat table with the same number of columns in each row.

In [124]: arr = np.stack((A,B), axis=1)
In [125]: arr
Out[125]: 
array([[1, 5],
       [2, 4],
       [3, 3],
       [4, 2],
       [5, 1]])

Here's one possible write format:

In [126]: np.savetxt('foo.txt', arr, fmt='%d', header=f'{C} {D}', delimiter=',')
     ...: 
In [127]: cat foo.txt
# [6] [7]
1,5
2,4
3,3
4,2
5,1

I put the scalars in a header line, since they don't match with the arrays.

loadtxt can recreate that arr array:

In [129]: data = np.loadtxt('foo.txt', dtype=int, skiprows=1, delimiter=',')
In [130]: data
Out[130]: 
array([[1, 5],
       [2, 4],
       [3, 3],
       [4, 2],
       [5, 1]])

The header line can be read with:

In [138]: with open('foo.txt') as f:
     ...:     header = f.readline().strip()
     ...:     line = header[1:]
     ...: 
In [139]: line
Out[139]: ' [6] [7]'

I should have saved it as something that's simpler to parse, like '# 6,7'

Your accepted answer creates a dataframe with nan values and blanks in the csv

In [143]: import pandas as pd
In [144]: df = pd.concat([pd.DataFrame(arr) for arr in [A,B,C,D]], axis=1)
     ...: df.to_csv("test.txt", na_rep="", sep=" ", header=False, index=False)
In [145]: df
Out[145]: 
   0  0    0    0
0  1  5  6.0  7.0
1  2  4  NaN  NaN
2  3  3  NaN  NaN
3  4  2  NaN  NaN
4  5  1  NaN  NaN
In [146]: cat test.txt
1 5 6.0 7.0
2 4  
3 3  
4 2  
5 1 

Note that np.nan is a float, so some of the columns are float as a result. loadtxt can't handle those "blank" columns; np.genfromtxt is better at that, but it needs a delimiter like , to mark them.

Writing and reading the full length arrays is easy. But mixing types gets messy.

Here's a format that would be easier to write and read:

In [149]: arr = np.zeros((5,4),int)
     ...: for i,var in enumerate([A,B,C,D]):
     ...:     arr[:,i] = var
     ...: 
In [150]: arr
Out[150]: 
array([[1, 5, 6, 7],
       [2, 4, 6, 7],
       [3, 3, 6, 7],
       [4, 2, 6, 7],
       [5, 1, 6, 7]])
In [151]: np.savetxt('foo.txt', arr, fmt='%d', delimiter=',')
In [152]: cat foo.txt
1,5,6,7
2,4,6,7
3,3,6,7
4,2,6,7
5,1,6,7
In [153]: np.loadtxt('foo.txt', delimiter=',', dtype=int)
Out[153]: 
array([[1, 5, 6, 7],
       [2, 4, 6, 7],
       [3, 3, 6, 7],
       [4, 2, 6, 7],
       [5, 1, 6, 7]])
0
On

A smarter solution could be to use pandas instead of numpy (if that is an option for you):

df = pd.concat([pd.DataFrame(arr) for arr in [A,B,C,D]], axis=1)
df.to_csv("test.txt", na_rep="", sep=" ", header=False, index=False)
a = pd.read_csv("test.txt", sep=" ", header=None).values

The first line creates a dataframe by concatenating all your arrays. Pandas' default behaviour is to replace missing values with NaNs. The second line writes the output file replacing NaNs by an empty string (as you seem to care about the file size). The last line gives you a numpy array:

In [45]: a
Out[45]: 
array([[ 1.,  5.,  6.,  7.],
       [ 2.,  4., nan, nan],
       [ 3.,  3., nan, nan],
       [ 4.,  2., nan, nan],
       [ 5.,  1., nan, nan]])

EDIT:

Since your input was of integer type,

In [20]: A.dtype
Out[20]: dtype('int64')

more precisely a 64-bit signed integer, you may want to get the same type back.

To get that, just do:

a = pd.read_csv("test.txt", sep=" ", header=None).fillna(0).astype(np.int)

So you first replace NaNs with zeros as you don't use those values anyway, and transform everything directly to np.int (pandas' Int64 would support NA values, but then you should transform your arrays to numpy's int64 again, so it's not worth it).

You will get a pandas DataFrame:

In [63]: a
Out[63]: 
   0  1  2  3
0  1  5  6  7
1  2  4  0  0
2  3  3  0  0
3  4  2  0  0
4  5  1  0  0

From which you can easily get back your arrays:

A = a[0].to_numpy(); B=a[1].to_numpy(); C=a.iloc[0,2]; D=a.iloc[0,3]