Blaze Data field map throws TypeError

134 Views Asked by At

I have recently started moving my data exploration code set from pandas to blaze. I am running into the following issue.

Assume:

from blaze import *

s = Data([(1, 'Alice', 100),
...           (2, 'Bob', -200),
...           (3, 'Charlie', 300),
...           (4, 'Denis', 400),
...           (5, 'Edith', -500)],
...          fields=['id', 'name', 'balance'])

we can using pandas.DataFrame via into readily compute something like:

into(pd.DataFrame,s).balance.apply(abs)

However, I am having serious difficulties trying to do:

s.balance.map(abs,schema='{b: int64}')

throws a TypeError: a bytes-like object is required, not 'int' among other things.

This issue seems related to Best approach to apply a function to a column or create a new column by applying a function to another one? which is closed, so I am not sure where to turn.

ps: if you feel this is trivial and want to mark the question down, please also provide a complete working answer.

1

There are 1 best solutions below

3
On BEST ANSWER

Try passing 'int64' as the datashape, rather than passing in a value for schema. It's the second keyword argument, so you don't need to name it. The following:

from blaze import *
s = Data([(1, 'Alice', 100),
          (2, 'Bob', -200),
          (3, 'Charlie', 300),
          (4, 'Denis', 400),
          (5, 'Edith', -500)],
          fields=['id', 'name', 'balance'])
s.balance.map(abs, 'int64')

works for me, and produces:

   balance
0      100
1      200
2      300
3      400
4      500

p.s. Though importing everything from blaze seems to be clobbering the built-in abs with blaze.expr.abs, I don't think that matters.