Why is numpy too slow when extracting data from CDF files using pycdf?

119 Views Asked by At

I would like to extract a variable "time" from cdf file named "filename.cdf". For this I used the following code:

import numpy as np 
from spacepy import pycdf 

data = pycdf.CDF('filename.cdf')
e1 = np.array(data['time']);
e2 = np.hstack([time, e1]) if time.size else e1

The size of variable "time" is 5529600. It takes a hell lot of time when I execute e1 and e2.

What is the correct method to work with such huge datasets?

1

There are 1 best solutions below

0
On

Assuming it's not because you are low on memory, which could explain a slow array allocation. I advise specifiying the dtype of the array to avoid the cost of having it inferred:

e1 = np.array(data['time'], dtype=np.float32);

or

e1 = np.array(data['time'], dtype=np.int64);

depending on the numerical type you wish for.