Reading in lines of numeric strings into separate lists in python

119 Views Asked by At

I have csv file which has multiple lines of numeric string values of following format:

csv sample of 2 lines:

[['ASA00211063', '2005'], [-0.434358, -0.793407, -1.070576, nan, nan,...(365 values)], [0.354615, -0.108102,nan,...(365 values)]]

[['AFR02516075', '1998'], [-0.434358, -0.7934039, -1.0705767, nan, nan,...(365 values)], [0.3546153, -0.1081022, nan,...(365 values)]]

How can I split as well as join the csv file into lists, such that out put is:

list[0] = ['ASA00211063', '2005'], ['AFR02516075', '1998']...
list[1] = [-0.434358, -0.793407, -1.070576, nan, nan,..., 0.354615, -0.108102,nan,...(**730** values)]
list[2] = [-0.434358, -0.7934039, -1.0705767, nan, nan,..., 0.3546153, -0.1081022, nan,...(**730** values)]
2

There are 2 best solutions below

1
On BEST ANSWER

I think I satisfied your requirements with this code:

#!/usr/bin/python

import re

data = [[]]

for line in open('in'):
    line = line.strip()
    line = re.match(r'\[?(.*)\]', line).group(1)

    res = re.split(r', (?=\[)', line)

    data[0].append(res[0])
    string = res[1] + res[2]
    data.append([string])

for i, v in enumerate(data):
    print("{}\n".format(data[i]))

Input:

[['ASA00211063', '2005'], [-0.434358, -0.793407, -1.070576, nan, nan,...(365 values)], [0.354615, -0.108102,nan,...(365 values)]]
[['AFR02516075', '1998'], [-0.434358, -0.7934039, -1.0705767, nan, nan,...(365 values)], [0.3546153, -0.1081022, nan,...(365 values)]]
[['XXX02516075', '1998'], [-1.434358, -1.7934039, -1.1705767, nan, nan,...(365 values)], [0.7546153, -0.7081022, nan,...(365 values)]]

Output:

data[0]:
["['ASA00211063', '2005']", "['AFR02516075', '1998']", "['XXX02516075', '1998']"]

data[1]:
['[-0.434358, -0.793407, -1.070576, nan, nan,...(365 values)][0.354615, -0.108102,nan,...(365 values)]']

data[2]:
['[-0.434358, -0.7934039, -1.0705767, nan, nan,...(365 values)][0.3546153, -0.1081022, nan,...(365 values)]']

data[3]:
['[-1.434358, -1.7934039, -1.1705767, nan, nan,...(365 values)][0.7546153, -0.7081022, nan,...(365 values)]']
6
On

To read a pythonic structure from a text file always use ast.literal_eval() it will only read in python structures and prevents anyone from embedding anything nasty in an input file.

This code will go through each line in your input file and append it to a list from which you can decide what to do.

import ast

l = []
for line in open('inputfile.txt'):
    edited_line = line.replace('nan','"nan"')
    l.append(ast.literal_eval(edited_line))

This will also replace all nan with numpy.nan objects:

import ast
from numpy import nan

l = []
for line in open('inputfile.txt'):
    edited_line = line.replace('nan','"nan"')
    edited_line = ast.literal_eval(edited_line)
    edited_line =  [[nan if v == 'nan' else v for v in vals] for vals in edited_line]
    l.append(edited_line)

# combine elements [1] and [2] in the sublist to a list of len = 730
# element l[0] is list of ['code', 'yyyy']
# element l[1 ... n] is list of data by row of length 730
l = [[subl[0] for subl in l]] + [subl[1]+subl[2] for subl in l]

gives output:

for row in l: print row
>>> [['ASA00211063', '2005'], ['AFR02516075', '1998']]
    [-0.434358, -0.793407, -1.070576, nan, nan, 0.354615, -0.108102, nan]
    [-0.434358, -0.7934039, -1.0705767, nan, nan, 0.3546153, -0.1081022, nan]