How to pass in a list of pandas' iterators as the argument for zip?

79 Views Asked by At

I am reading five huge CVS files. All of them have the same number of rows but the number of rows is in millions. Because of memory constraint, I need to read them in batches and subsequently join the data from different files into a single Dataframe.

Below is what I have now:

import pandas as pd
it1 = pd.read_csv('1.csv', chunksize=10)
it2 = pd.read_csv('2.csv', chunksize=10)

it3 it4 it5 are given in a list list_iterators. That is:

list_iterators = [it3  it4  it5]

What I want to achieve is that whenever I perform a read operation, I will get the data from all iterators in a list form.

So the first time I read them, I will have:

[first 10 rows in 1.csv, first 10 rows in 2.csv, first 10 rows in 3.csv ...  first 10 rows in 5.csv]

In order to achieve the desired outcome, what I am doing now is:

ak = zip(it1, it2, list_iterators[0], list_iterators[1], list_iterators[2])
ak.__next__() #I will call this to read the next 10 rows

I wonder if there is any way that I can pass the list_iterators as an argument instead of spelling out all the elements inside it because I wouldn't be able to know how many elements are there in list_iterators when I write my program.

My second question is that instead of using __next__(), is there a more elegant way of retrieving the data from the pandas iterators.

1

There are 1 best solutions below

0
On BEST ANSWER

I wonder if there is any way that I can pass the list_iterators as an argument

Yes, you can pass the contents of list_iterators using the * operator:

ak = zip(it1, it2, *list_iterators)