Try to Select jsonl data column in another columns with .loc but got KeyError even though the key exists

117 Views Asked by At

this is my data structure in jsonl

"content": "Not yall gassing up a gay boy with no rhythm", "place": {"_type": "snscrape.modules.twitter.Place", "fullName": "Manhattan, NY", "name": "Manhattan", "type": "city", "country": "United States", "countryCode": "US"}

i try to select countryCode from place column with this code

country_df = test_df.loc[test_df['place'].notnull(), ['content', 'place']]
countrycode_df = country_df["place"].loc["countryCode"]

but it gave me this error

KeyError: 'countryCode'

how do i fix this?

I had try this method but it doesnt fit my situation

2

There are 2 best solutions below

7
On BEST ANSWER

You can access it with str:

country_df['place'].str['countryCode']

Output:

0    US
Name: place, dtype: object
7
On

Since "place" is basically a dict (a nested dict), you can access it like the higher level dict

country = {"content": "Not yall gassing up a gay boy with no rhythm", "place": {"_type": "snscrape.modules.twitter.Place", "fullName": "Manhattan, NY", "name": "Manhattan", "type": "city", "country": "United States", "countryCode": "US"}}
country["place"]["countryCode"]

output:

'US'

However, it might be better for your purpose to use pandas json_normalize():

country_df = pd.json_normalize(data = country)

print(country_df )

output:

content place._type place.fullName place.name place.type place.country place.countryCode
Not yall gassing up a gay boy with no rhythm snscrape.modules.twitter.Place Manhattan, NY Manhattan city United States US