Log values by SFrame column

7.4k Views Asked by At

Please, can anybody tell me, how I can take logarithm from every value in SFrame, graphlab (or DataFrame, pandas) column, without to iterate through the whole length of the SFrame column? I specially interest on similar functionality, like by Groupby Aggregators for the log-function. Couldn't find it someself...

Important: Please, I don't interest for the for-loop iteration for the whole length of the column. I only interest for specific function, which transform all values to the log-values for the whole column.

I'm also very sorry, if this function is in the manual. Please, just give me a link...

3

There are 3 best solutions below

0
On BEST ANSWER

numpy provides implementations for a wide number of basic mathematical transformations. You can use those on all data structures that build on numpy's ndarray.

import pandas as pd
import numpy as np
data = pd.Series([np.exp(1), np.exp(2), np.exp(3)])
np.log(data)

Outputs:

0    1
1    2
2    3
dtype: float64

This example is for pandas data types, but it works for all data structures that are based on numpy arrays.

1
On

@cel

I think, in my case it could be possible also to use next pattern.

import numpy
import pandas
import graphlab


df
    a b c 
    1 1 1 
    1 2 3
    2 1 3
    ....

df['log c'] = df.groupby('a')['c'].apply(lambda x: numpy.log(x))

for SFrame (sf instead df object) it could look little be different

logvals = numpy.log(sf['c'])
log_sf = graphlab.SFrame(logvals)
sf = sf.join(log_sf, how = 'outer')

Probably with numpy the code fragment is a little bit to long, but it works...

The main problem is of course time perfomance. I did hope, I can fnd some specific function to minimise my time....

0
On

The same "apply" pattern works for SFrames as well. You could do:

import graphlab
import math

sf = graphlab.SFrame({'a': [1, 2, 3]})
sf['b'] = sf['a'].apply(lambda x: math.log(x))