Py-tables vs Blaze vs S-Frames

184 Views Asked by user3510503 At 18 August 2025 at 10:04

I am working on an exploratory data analysis using python on a huge Dataset (~20 Million records and 10 columns). I would be segmenting, aggregating data and create some visualizations, I might as well create some decision trees liner regression models using that dataset.

Because of the large data set I need to use a data-frame that allows out of core data storage. Since I am relatively new to Python and working with large data-sets, i want to use a method which would allow me to easily use sklearn on my data-sets. I'm confused weather to use Py-tables, Blaze or s-Frame for this exercise. If someone could help me understand what are their pros and cons. What are the factors that are important in this kind of decision making that would be much appreciated.

Original Q&A

There are 1 best solutions below

lorenzori On 21 July 2017 at 07:19

good question! one option you may consider is to not use any of the libraries aformentioned, but instead read and process your file chunk-by-chunk, something like this:

csv="""\path\to\file.csv"""

pandas allows to read data from (large) files chunk-wise via a file-iterator:

it = pd.read_csv(csv, iterator=True, chunksize=20000000 / 10)

for i, chunk in enumerate(it): ...

Py-tables vs Blaze vs S-Frames

There are 1 best solutions below

Related Questions in PYTHON-3.X

Related Questions in PANDAS

Related Questions in HDF5

Related Questions in PYTABLES

Related Questions in BLAZE

Trending Questions

Popular # Hahtags

Popular Questions