How can I merge keys with the same value into a single object?

131 Views Asked by At

I use the Parsehub API to scrape the data below in a json format and when I want to print information on a certain country I'm able to get only the first set of data: 'name', 'pop','area','growth','worldPer', and 'rank' but I'm unable to get 'image'.

When I print the entire json file, the data is all there, but when I try to print a countries data with the image value, I get a key error.

Is there a way I can merge the two objects by matching country name?

main.py

class Data:
    def __init__(self, api_key, project_token):
        self.api_key = api_key
        self.project_token = project_token
        self.params = {"api_key":api_key}
        self.data = self.get_data()

    def get_data(self):
        r = requests.get(f'https://www.parsehub.com/api/v2/projects/xxxx/last_ready_run/data', params={"api_key": DATA_API_KEY})
        data = json.loads(r.text)
        print(r.text)
        return data

    def data_by_name(self,country):
        data = self.data['country']
        for content in data:
            if content['name'].lower() == country.lower():
                print(content)
                name = content['name']
                pop = content['pop']
                popRank = content['rank']
                growth = content['growth']
                per = content['worldPer']
                area = content['area']
                image = content['image'] #<----- KeyError: 'image'
        return(name,pop,popRank,growth,per,area)

data = Data(DATA_API_KEY,DATA_PROJECT_TOKEN)
data.data_by_name('china')

country.json

{
 "country": [
  {
   "name": "China",
   "pop": "1,438,862,614",
   "area": "9,706,961 km²",
   "growth": "0.39%",
   "worldPer": "18.47%",
   "rank": "1"
  },
  {
   "name": "China",
   "image": "https://s3.amazonaws.com/images.wpr.com/flag-pages/png250/cn.png"
  }
 ]
}
3

There are 3 best solutions below

0
Chris On

Pandas could handle this for you

import pandas as pd

d = {
 "country": [
  {
   "name": "China",
   "pop": "1,438,862,614",
   "area": "9,706,961 km²",
   "growth": "0.39%",
   "worldPer": "18.47%",
   "rank": "1"
  },
  {
   "name": "China",
   "image": "https://s3.amazonaws.com/images.wpr.com/flag-pages/png250/cn.png"
  }
 ]
}

df = pd.DataFrame.from_dict(d['country']).groupby('name').first()

Output

                 pop           area growth worldPer rank                                              image
name
China  1,438,862,614  9,706,961 km²  0.39%   18.47%    1  https://s3.amazonaws.com/images.wpr.com/flag-p...
0
Edison Feneyab On

It would be better to store the data of each country in a dictionary so you don't iterate over all the data each time. You could do like this:

def __init__(self, api_key, project_token):
    ...
    self.countries_data = self.get_countries_data()

...

def get_countries_data(self):
    countries_data = {}
    for content in self.data["country"]:
        name = content["name"]
        countries_data[name] = {**countries_data.get(name, {}), **content}
    return countries_data

def data_by_name(self, country):
    conuntry_data = self.countries_data[country]
    return country_data["name"], country_data["pop"]...
0
Sam On

There are (at least) two ways to go about this: you could merge all the different entries with the same name, e.g. china, in the data. Or you could search through all the countries each time, and grab all the necessary data from each one that matches your country. Here's an example of the 2nd one, where I modify your data_by_name method. The advantage of this is it works even if you don't know how many times the country might appear:

def data_by_name(self,country):
    data = self.data['country']
    my_dict = {}
    for content in data:
        if content['name'].lower() == country.lower():
            print(content)
            my_dict.update(content) # This updates your dict with the key/value pairs
    return my_dict     # my_dict will have all the different values, including image

If you want only specific fields, you could return those:

    return (
        my_dict['name'],
        my_dict['pop'],
        my_dict['rank'],
        my_dict['growth'],
        my_dict['worldPer'],
        my_dict['area'],
        my_dict['image']
    )

Hope that helps, happy coding!