How to find 1st of months between start and end dates and add them to a dataframe?

69 Views Asked by At

I have a dataset like this

import pandas as pd

df = pd.DataFrame(
    {
        "id": {0: 1, 1: 1, 2: 1, 3: 2, 4: 2},
        "price": {0: 20, 1: 41, 2: 61, 3: 68, 4: 10},
        "date_month_start": {
            0: "2021-06-12",
            1: "2021-11-13",
            2: "2022-02-27",
            3: "2021-04-14",
            4: "2021-07-11",
        },
        "date_month_end": {
            0: "2021-09-14",
            1: "2022-01-13",
            2: "2022-04-12",
            3: "2021-06-18",
            4: "2021-10-16",
        },
    }
)
print(df)
   id  price date_month_start date_month_end
0   1     20       2021-06-12     2021-09-14
1   1     41       2021-11-13     2022-01-13
2   1     61       2022-02-27     2022-04-12
3   2     68       2021-04-14     2021-06-18
4   2     10       2021-07-11     2021-10-16

But I would like to create a column for first of month that falls between start and end date and repeat rows (except first of month date) if there are more than one first of months falls between start and end date.

For instance if the start date is March 12, 2021 and end date is June 04, 2021, than I would like to have a new column April 1st 2021, May 1st 2021, June 1st 2021. As we have three values for the new column so, I would like to repeat rows by copying other column values except the new one.

The output data should look like:

    id  price date_month_start date_month_end date_month
0    1     20       2021-06-12     2021-09-14 2021-07-01
1    1     20       2021-06-12     2021-09-14 2022-08-01
2    1     20       2021-06-12     2021-09-14 2022-09-01
3    1     41       2021-11-13     2022-01-13 2021-12-01
4    1     41       2021-11-13     2022-01-13 2022-01-01
5    1     61       2022-02-27     2022-04-12 2022-03-01
6    1     61       2022-02-27     2022-04-12 2022-04-01
7    2     68       2021-04-14     2021-06-18 2021-05-01
8    2     68       2021-04-14     2021-06-18 2021-06-01
9    2     10       2021-07-11     2021-10-16 2021-08-01
10   2     10       2021-07-11     2021-10-16 2021-09-01
11   2     10       2021-07-11     2021-10-16 2021-10-01

I am new in python, anyone has any direction how to do it? I can get first day of month from date column, but it is a whole different thing.

1

There are 1 best solutions below

0
On

Here is one way to do it:

from pandas.tseries.offsets import MonthEnd

# Convert into Pandas datetimes
df['date_month_start'] = pd.to_datetime(df['date_month_start'])
df['date_month_end'] = pd.to_datetime(df['date_month_end'])

# For each row of 'df', find month starts between start and end date
# Duplicate the row and add new column
# Store new intermediate dataframe in list (dfs)
dfs = []
for i in range(df.shape[0]):
    temp_df = df.loc[i, :]
    new_month = pd.Series(
        [
            temp_df["date_month_start"] + MonthEnd(i) + pd.Timedelta(1, "d")
            for i in range(1, 13)
            if temp_df["date_month_start"] + MonthEnd(i) + pd.Timedelta(1, "d")
            < temp_df["date_month_end"]
        ]
    )
    temp_df = pd.DataFrame([temp_df.to_list() for _ in range(len(new_month))])
    temp_df[4] = new_month
    dfs.append(temp_df)

# Concat intermediate dataframes into one
new_df = dfs[0]
for df in dfs[1:]:
    new_df = pd.concat([new_df, df])

# Cleanup
new_df.columns = ["id", "price", "date_month_start", "date_month_end", "date_month"]
new_df = new_df.reset_index(drop=True)
print(new_df)
# Output
    id  price date_month_start date_month_end date_month
0    1     20       2021-06-12     2021-09-14 2021-07-01
1    1     20       2021-06-12     2021-09-14 2021-08-01
2    1     20       2021-06-12     2021-09-14 2021-09-01
3    1     41       2021-11-13     2022-01-13 2021-12-01
4    1     41       2021-11-13     2022-01-13 2022-01-01
5    1     61       2022-02-27     2022-04-12 2022-03-01
6    1     61       2022-02-27     2022-04-12 2022-04-01
7    2     68       2021-04-14     2021-06-18 2021-05-01
8    2     68       2021-04-14     2021-06-18 2021-06-01
9    2     10       2021-07-11     2021-10-16 2021-08-01
10   2     10       2021-07-11     2021-10-16 2021-09-01
11   2     10       2021-07-11     2021-10-16 2021-10-01