Convert Nested DateTime json to Pandas DataFrame

502 Views Asked by At

The problem here is different than those defined in the questions here, here, and here. Specifically, the transformation and form of the output I want is different than any specified in those questions, and I also want a DateTime index. These differences cause the answers on those pages fail.

I have data formatted as a json like this:

{
    "Meta Data": {
        "1. Information": "Daily Prices (open, high, low, close) and Volumes",
        "2. Symbol": "ABC",
    },
    "Time Series (Daily)": {
        "2001-06-31": {
            "1. open": "113.2000",
            "4. close": "113.8000",
        },
        "2001-07-01": {
            "1. open": "114.2000",
            "4. close": "114.2000",
        }
    }
}

I want the output to look be a pandas dataframe like this:

"Time Series (Daily)" | "1. open" | "4. close"
"2001-06-31"          | 113.2000  | 113.8000
"2001-07-01"          | 114.2000  | 114.2000

I wrote a function that works, but the for loop leaves performance wanting and I find it hard to read.

def convert_json_to_dataframe(all_json_data):
    json_data = all_json_data["Time Series (Daily)"]
    dates = []
    open = []
    close = []
    for key in json_data.keys():
        dates.append(key)
        open.append(json_data[key]["1. open"])
        close.append(json_data[key]["4. close"])
    df = pd.DataFrame(
        list(zip(open, close)),
        columns=["1. open", "4. close"],
        index=dates,
    )
    df = df.apply(pd.to_numeric, errors="ignore")
    return df

There's got to be a simpler, easier to read, higher-performing way to do this, maybe with json_normalize in pandas, but I can't figure it out.


UPDATE AFTER ANSWERS & RESOLUTION. All I had to do was:

df = pd.DataFrame(json_data["Time Series (Daily)"]).T

Pandas discovered the index and column names automatically, so I didn't need the reset_index portion of the answers.

The orient approach also worked:

df = pd.DataFrame.from_dict(json_data["Time Series (Daily)", orient="index")

To get all the numbers as floats instead of strings, I did need the apply line:

df = df.apply(pd.to_numeric, errors="coerce")

Thank you everyone.

3

There are 3 best solutions below

0
On BEST ANSWER

Why don't you just do this?

pd.DataFrame(data["Time Series (Daily)"]).T.reset_index().rename(columns = {"index":"Time Series (Daily)"})

Output -

Time Series (Daily) 1. open 4. close
0 2001-06-31 113.2000 113.8000
1 2001-07-01 114.2000 114.2000
0
On

It seems the relevant data is only under "Time Series (Daily)" key, so you could get that and construct a DataFrame (use the orient parameter to get it in the correct shape):

out = pd.DataFrame.from_dict(my_data['Time Series (Daily)'], orient='index')

Output:

            1. open  4. close
2001-06-31  113.2000  113.8000
2001-07-01  114.2000  114.2000
0
On

Taking json from an URL is the easiest way:

import requests
url='url from json file'
r=requests.get(url)
data=r.json()
df=pd.DataFrame(data['Time Series (Daily)']).T
df=df.reset_index('Time Series (Daily)')

Saying “thanks” is appreciated, but it doesn’t answer the question. Instead, vote up the answers that helped you the most! If these answers were helpful to you, please consider saying thank you in a more constructive way – by contributing your own answers to questions your peers have asked here.