While iteration over numpy array, I can't call methods of objects stored in array

612 Views Asked by At

first question asked in StackOverflow, so tips on how to 'ask' better are most welcome.

Basic Goal of this part of the code: A number of balls (no_balls) move in random directions.

I am trying to move from python lists to numpy arrays for better performance. Here is the reduced code.

Basic Problem: My iterator gives me objects of type ndarray not vpy.sphere , therefore calling sphere.pos on the objects I am iterating over fails. Or is this not possible, since Numpy is build for numbers?? Alternatives for performance?

import vpython as vpy
import numpy as np

#Create and Fill numpy array with random size balls
balls = np.empty([no_ball], dtype=vpy.sphere)

with np.nditer(balls, flags=['refs_ok'], op_flags=['readwrite']) as b_it:
    debug_msg(len(b_it))
    for b in b_it:
        b[...] = (vpy.sphere( radius=random_in_range(ball_min_r,ball_max_r), 
                              opacity=0.8, 
                              color=random_RGB(), 
                              pos=vpy.vector(0,0,0),))
    debug_msg('populated balls list')

#Main Loop
debug_msg('Starting Main Loop')
while True:
    vpy.rate(30)
            
            
with np.nditer(balls, flags=['refs_ok'], op_flags=['readwrite']) as b_it:
    #Main Loop
    debug_msg('Starting Main Loop')
    while True:
        vpy.rate(30)
            
#The actual loop manipulates the position but the problem is that I can't access the   position of the sphere objects. Type returns nd.array for b
        for b in b_it:
           debug_msg(type(b[...]))
           debug_msg(b[...].pos)
#Above outputs
<class 'numpy.ndarray'>
Traceback (most recent call last):
  File "path", line 93, in <module>
    debug_msg(b[...].pos)
AttributeError: 'numpy.ndarray' object has no attribute 'pos'

How do I call methods and members of objects in the array. And on a sidenote, why do I need to call b[...] instead of b, seems obsolete.

1

There are 1 best solutions below

2
On

A simple class:

In [149]: class Foo():
     ...:     def __init__(self,i):
     ...:         self.i = i
     ...:     def __repr__(self):
     ...:         return f'<FOO {self.i}>'
     ...: 
In [150]: Foo(323)
Out[150]: <FOO 323>

A list of such objects:

In [151]: alist = [Foo(i) for i in range(10)]

An equivalent object dtype array:

In [152]: arr = np.array(alist)
In [153]: arr.dtype
Out[153]: dtype('O')
In [154]: arr
Out[154]: 
array([<FOO 0>, <FOO 1>, <FOO 2>, <FOO 3>, <FOO 4>, <FOO 5>, <FOO 6>,
       <FOO 7>, <FOO 8>, <FOO 9>], dtype=object)

Fetching the attribute from the list:

In [155]: [f.i for f in alist]
Out[155]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [156]: timeit [f.i for f in alist]
826 ns ± 8.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

and from the array (slower):

In [157]: timeit [f.i for f in arr]
1.66 µs ± 15.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Using nditer - you studied the docs enough to get the flags right, but didn't grasp that b is an array, not a Foo:

In [158]: with np.nditer(arr, flags=['refs_ok'], op_flags=['readwrite']) as b_it:
     ...:     for b in b_it:
     ...:         print(b, b.dtype, b.shape, b.item())
     ...: 
<FOO 0> object () <FOO 0>
<FOO 1> object () <FOO 1>
<FOO 2> object () <FOO 2>
<FOO 3> object () <FOO 3>
<FOO 4> object () <FOO 4>
<FOO 5> object () <FOO 5>
<FOO 6> object () <FOO 6>
<FOO 7> object () <FOO 7>
<FOO 8> object () <FOO 8>
<FOO 9> object () <FOO 9>

Fetching a list of the attribute:

In [159]: res = []
     ...: with np.nditer(arr, flags=['refs_ok'], op_flags=['readwrite']) as b_it:
     ...:     for b in b_it:
     ...:         res.append(b.item().i)
     ...: 
     ...: 
In [160]: res
Out[160]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

And a poor timing:

In [161]: %%timeit
     ...: res = []
     ...: with np.nditer(arr, flags=['refs_ok'], op_flags=['readwrite']) as b_it:
     ...:     for b in b_it:
     ...:         res.append(b.item().i)
     ...: 

7.25 µs ± 60.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

One of the cleaner ways of performing an action on elements of an object array is with frompyfunc:

In [162]: f = np.frompyfunc(lambda b:b.i,1,1)
In [163]: f(arr)
Out[163]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=object)
In [164]: timeit f(arr)
2.1 µs ± 8.58 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Still slower than the iteration, though if we want an array instead of just a list, it is better than:

In [165]: timeit np.array([f.i for f in arr])
5.79 µs ± 21.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The nditer docs need a stronger performance disclaimer. nditer when used in c or cython code is useful and fast, but when accessed via Python code it is inferior to more obvious alternatives. It's extra bells-n-whistles may be useful in some cases, but mostly I see it as a bridge to properly compiled code, not as an end of itself.

At the heart of the performance issue is that Foo is a Python class. So accessing the i attribute has to use the full Python referencing system. It can't make use of any of the fast compiled numpy numeric methods.