SFrame Column of type: Dictionary

631 Views Asked by At

when I run:

my_sframe['col_1'] = ''

I get a blank column, which is what I wanted.

However when I run:

my_sframe['col_1'] = {}

I get an error that says unexpected data type.

The SFrame API doesn't address this, as seen here:

https://turi.com/products/create/docs/generated/graphlab.SFrame.html

My understanding at this point was that SFrame columns cannot be dictionaries.

However, out of curiosity I tried this:

my_sframe['col_1'] = graphlab.text_analytics.count_words('my_text')

type(my_sframe['col_1'][1])

out: dict

This result was really the antithesis of my previous understanding.

What I want is a dictionary column, each row having its own dictionary, much like .count_words only I am using word_count dictionaries made from scratch, via import string.

Is this a two way street, or is .count_words an exception and I shouldn't expect to be able to reproduce that kind of data transformation?

Please advise,

Thank you

UPDATE

Here appears to be some relevant information on GitHub:

https://github.com/turi-code/how-to/blob/master/sframe_pack.py

I'm not sure if this technique can create what I'm after, I'm still trying. Let me know if anyone has any thoughts on this.

1

There are 1 best solutions below

0
On

I'm still open to accepting a more efficient answer, but in the meantime, if anyone else has been having this issue, here is one way to create a SFrame column of dictionaries. I just figured it out:

def count_words(text):
    words = text.split()
    wordfreq = {}
    for x in words:
        if x not in wordfreq:
            wordfreq[x] = 0
        wordfreq[x] += 1
    return wordfreq

sframe['word_count'] = sframe['text'].apply(count_words)

You'll notice the dtype is dict. Seems a little complicated. I would still love to know why we can't just use a cast approach for a new column, instead of it saying error: unexpected data type.