import numpy as numpy
from collections import namedtuple
from random import random
Smoker = namedtuple("Smoker", ["Female","Male"])
Nonsmoker = namedtuple("Nonsmoker", ["Female","Male"])
LST = [(Smoker(random(),random()),Nonsmoker(random(),random())) for i in range(100)]
So I have a long list whose elements are tuples. Each tuple contains a pair of namedtuples. What is the fastest way to find the average of this list? Ideally the result is still of the same structure, that is, (Smoker(Female=w,Male=x),Nonsmoker(Female=y,Male=z))
..
grizzly = Smoker(np.mean([a.Female for a,b in LST]),np.mean([a.Male for a,b in LST]))
panda = Nonmoker(np.mean([b.Female for a,b in LST]),np.mean([b.Male for a,b in LST]))
result = (grizzly, panda)
np.mean
has to convert the list to an array, which takes time. Pythonsum
saves time:Both produce the same
result
(to within a small epsilon):If you could collect the values in one array, possibly (n,4) shape, then the mean will be fast. For one time calculation it probably isn't worth it -
Since named tuples can be accessed like regular tuples, we can make an array directly from
LST
:But timing isn't encouraging:
I can also make a structured array from your list - with nested dtypes:
Curiously timing is relatively good:
But I have to take each mean separately, or else convert it to an unstructured array first.
I could make a (n,4) float array from the structured one with a
view
or arecfunctions
utility: