What is the most efficient way to implement 2 data structures for iteration of different values

59 Views Asked by At

I am trying to implement a data structure to parse data and do data analysis

My data format is as follows

DATE-TIME    DATA  DATA  DATA
31/3/1999 20:40    6  130  19.95
31/3/1999 23:50    3  440  17.95
31/3/1999 23:20    4  300  18.81
31/3/1999 23:30    5  313  18.48
20/7/1999 23:40    20  1000  18.19
31/7/1999 23:40    5  110  18.23
25/5/2023 23:40    0  130  26
22/2/2023 23:20    2  110  33.3
15/2/2023 23:30    3  110  22.2
20/2/2023 23:40    12  110  30.1
31/3/2023 23:40    34  100  0

The requirements is that will have to access the data either by year OR by month for each year.

Usage of MAPS and BST are a MUST, vector is not necessary.

I have already implemeted an algorithm with MAP and BST so currently its like

MAP[1999] = BST<THE DATA RELATED TO 1999>
MAP[2023] = BST<THE DATA RELATED TO 2023>

The issue I am facing is if i want to get example JULY i have to go through the whole tree to find where data.Month == june

I have tried going with something like that MAP[2023] = BST< VECTOR <JAN> <FEB> <MAR> <APR> <MAY> <JUN> <JUL> >

My rational is that I will store a vector of each month of record inside the BST. But I realised that the BST wouldnt know what record it is and how the store the vector properly. Another issue is that I wouldnt have a way to properly access the MONTH vector from the BST.

1

There are 1 best solutions below

2
Mestkon On

Using a map or multiple data structures in this case is not necessary at all. The structure of the data is very predictable so making a vector<monthly_data> is good enough.

Because you know which year and month the first entry in the vector is, calculating the range you are interested in is trivial.

To fetch the data related to a specific year you just calculate the offset to January of that year and retrieve a pointer to that data. The following 11 entries will be the rest of the data that year.

To fetch the data related to a specific month every year you just calculate the offset to the first entry of that month and retrieve a pointer to that data. Every 12th entry after that entry will be data related to that month in the following years.

struct monthly_data { /* stuff */ };

struct all_data
{
    std::vector<monthly_data> data;
    int start_year;
    int start_month;
};

void do_something_for_every_month_in_a_year(all_data& d, int year)
{
    int year_offset = year - d.start_year;
    int month_offset = -d.start_month;
    int jan_of_year_offset = 12*year_offset + month_offset;

    for (int i = 0; i < 12; ++i) {
        auto& entry = d.data[i + jan_of_year_offset];
        // do something with entry
    }
}

void do_something_with_month_for_every_year(all_data& d, int month)
{
    int month_offset = -d.start_month + month;
    while (true) {
        // break when no more entries
        auto& entry = d.data[month_offset];
        // do something with entry

        month_offset += 12;
    }
}

Note that the above code is just a conceptual example and has not been tested. I have also omitted checks for the edge cases.