filtering data from a dataframe column

32 Views Asked by At

I've been fooling around with pandas and dataframes for a while now, and decided i'd start a project using it. For the project I need to scrape bettings odds from a sportsbookie.

I've succesfully retrieved an api and made a dataframe using the data. But the column 'betOffers' contains over a thousand characters, of which I only want to print out the odds of the game. My code currently looks like this:

import pandas
import requests

api = 'https://eu-offering-api.kambicdn.com/offering/v2018/betcitynl/listView/table_tennis/czech_republic/czech_liga_pro/all/matches.json?lang=nl_NL&market=NL&client_id=2&channel_id=1&ncid=1711057956050&category=20141&useCombined=true&useCombinedLive=true'
response = requests.get(api)
responseData = response.json()

df = pandas.json_normalize(responseData, 'events')
odds = df['betOffers'].str.split('odds')

file_format = 'csv'
file_name = 'betcity' + file_format
odds.to_csv(file_name)

By using the split method I hoped i would get the odds, but the output is now just '', when using the strip method (trying to get rid of everything but the odds) it also outputs just ''

1

There are 1 best solutions below

2
e-motta On BEST ANSWER

pandas.Series.str.split does not do what you think it does.

One approach you can use is to analyze the structure of the JSON and then use the parameters available for pandas.json_normalize to extract the data you want. You're reading a JSON with nested data, which contains strings, numbers, objects and arrays, so you need to take that into account.

It looks something like this (some closing brackets and most of the data omitted for brevity):

{
    "events": [
        {
            "event": {
                <...>
                "path": [<...>],
                <...>
            },
            "betOffers": [
                {
                    <...>
                    "outcomes": [
                        {
                            "id": 3400584057,
                            "label": "Biolek, Martin",
                            "englishLabel": "Biolek, Martin",
                            "odds": 1670,
                            "participant": "Biolek, Martin",
                            "type": "OT_ONE",
                            "betOfferId": 2426142977,
                            "changedDate": "2024-03-22T15:57:53Z",
                            "participantId": 1030001763,
                            "oddsFractional": "4/6",
                            "oddsAmerican": "-150",
                            "status": "OPEN",
                            "cashOutStatus": "ENABLED",
                        },
                        <...>

So if you want the values for odds, you can add to your json_normalize call an argument record_path=["events", "betOffers", "outcomes"]. Pandas will then read the specified records as rows into your dataframe, with the attribute names as columns.

df = pandas.json_normalize(
    responseData, record_path=["events", "betOffers", "outcomes"]
)

This will then result in a dataframe like this:

             id             label      englishLabel  odds       participant  ... participantId  oddsFractional oddsAmerican  status cashOutStatus
0    3400584057    Biolek, Martin    Biolek, Martin  1760    Biolek, Martin  ...    1030001763             3/4         -132    OPEN       ENABLED
1    3400584058  Polasek, Radovan  Polasek, Radovan  1940  Polasek, Radovan  ...    1030002269           23/25         -107    OPEN       ENABLED
2    3400604669     Regner, Tomas     Regner, Tomas  7500     Regner, Tomas  ...    1030007971            13/2          650    OPEN       ENABLED
3    3400604670   Varecka, Michal   Varecka, Michal  1060   Varecka, Michal  ...    1030076687            1/18        -1667    OPEN       ENABLED
4    3400591157      Barta, Tomas      Barta, Tomas  1770      Barta, Tomas  ...    1030007122             3/4         -130    OPEN       ENABLED
..          ...               ...               ...   ...               ...  ...           ...             ...          ...     ...           ...
107  3400718588     Hofman, Denis     Hofman, Denis  2000     Hofman, Denis  ...    1028244351           Evens          100    OPEN       ENABLED
108  3400729274       Picek, Petr       Picek, Petr  1850       Picek, Petr  ...    1028308477           17/20         -118    OPEN       ENABLED
109  3400729275      Steffan, Jan      Steffan, Jan  1850      Steffan, Jan  ...    1028346029           17/20         -118    OPEN       ENABLED
110  3400724656  Mecl, Jan Junior  Mecl, Jan Junior  1500  Mecl, Jan Junior  ...    1006381083             1/2         -200    OPEN       ENABLED
111  3400724657      Pacha, Robin      Pacha, Robin  2400      Pacha, Robin  ...    1028256009             7/5          140    OPEN       ENABLED