Problem with loop to calculate IRR function in python

619 Views Asked by At

I have a problem with calculating a function in python. I want to calculate the IRR for a number of investment, all of which are described in their own dataframe. I have a dataframe for each investment up until a certain date, so I have a multiple dataframe describing the flows of payments the investment has made up until different dates for each investment, with the last row of each dataframe containing the information of the stock of capital that each investment has until that point. I do this in order to have like a time series of the IRR for each investment. Each dataframe of which I want to calculate the IRR is in a list.

To calculate the IRR for each dataframe I made these functions:

def npv(irr, cfs, yrs):
    return np.sum(cfs / ((1. +  irr) ** yrs))
def irr(cfs, yrs, x0)
    return np.asscalar(fsolve(npv, x0=x0, args=(cfs, yrs)))

So in order to calculate the IRR for each dataframe in my list I did:

 for i, new_df in enumerate(dfs):
   cash_flow = new_df.FLOWS.values
   years = new_df.timediff.values
   output.loc[i, ['DATE']] = new_df['DATE'].iloc[-1]
   output.loc[i, ['Investment']] = new_df['Investment'].iloc[-1]
   output.loc[i, ['irr']] = irr(cash_flow, years, x0=0.)

Output is the dataframe I want to create that the cointains the information I want, i.e the IRR of each invesment up until a certain date. The problem is, it calculates the IRR correctly for some dataframes, but not for others. For example it calculates the IRR correctly for this dataframe:

       DATE     INVESTMENT       FLOWS        timediff
0   2014-02-24      1        -36278400.0         0.0
1   2014-03-25      1        -11490744.0    0.07945205479452055
2   2015-01-22      1        -13244300.0    0.9095890410958904
3   2015-09-24      1        -10811412.0    1.5808219178082192
4   2015-11-12      1         -6208238.0    1.715068493150685
5   2016-01-22      1         -6210161.0    1.9095890410958904
6   2016-03-31      1         -4535569.0    2.0986301369863014
7   2016-05-25      1          8420470.0    2.249315068493151
8   2016-06-30      1         12357138.0    2.347945205479452
9   2016-07-14      1          3498535.0    2.3863013698630136
10  2016-12-26      1          4085285.0    2.8383561643835615
11  2017-06-07      1          3056835.0    3.2849315068493152
12  2017-09-11      1         11254424.0    3.547945205479452
13  2017-11-16      1          9274834.0    3.728767123287671
14  2018-02-22      1          1622857.0    3.9972602739726026
15  2018-05-23      1          2642985.0    4.243835616438356
18  2018-08-23      1          9265099.0    4.495890410958904
16  2018-11-29      1          1011915.0    4.764383561643836
19  2018-12-28      1          1760734.0    4.843835616438356
17  2019-01-14      1          1940112.0    4.890410958904109
20  2019-06-30      1         116957227.3   5.347945205479452

With an IRR of 0.215. But this dataframe, for the exact same investment it does not. It returns a IRR of 0.0001, but the real IRR should be around 0.216.

       DATE     INVESTMENT       FLOWS         timediff
0   2014-02-24      1        -36278400.0         0.0
1   2014-03-25      1        -11490744.0    0.07945205479452055
2   2015-01-22      1        -13244300.0    0.9095890410958904
3   2015-09-24      1        -10811412.0    1.5808219178082192
4   2015-11-12      1         -6208238.0    1.715068493150685
5   2016-01-22      1         -6210161.0    1.9095890410958904
6   2016-03-31      1         -4535569.0    2.0986301369863014
7   2016-05-25      1          8420470.0    2.249315068493151
8   2016-06-30      1         12357138.0    2.347945205479452
9   2016-07-14      1          3498535.0    2.3863013698630136
10  2016-12-26      1          4085285.0    2.8383561643835615
11  2017-06-07      1          3056835.0    3.2849315068493152
12  2017-09-11      1         11254424.0    3.547945205479452
13  2017-11-16      1          9274834.0    3.728767123287671
14  2018-02-22      1          1622857.0    3.9972602739726026
15  2018-05-23      1          2642985.0    4.243835616438356
18  2018-08-23      1          9265099.0    4.495890410958904
16  2018-11-29      1          1011915.0    4.764383561643836
19  2018-12-28      1          1760734.0    4.843835616438356
17  2019-01-14      1          1940112.0    4.890410958904109
20  2019-09-30      1        123753575.7    5.6

These two dataframes have exactly the same flows excepto the last row, of which it cointains the stock of capital up until that date for that investment. So the only difference between these two dataframes is the last row. This means this investment hasn't had any inflows or outflows during that time. I don't understand why the IRR varies so much. Or why some IRR are calculated incorrectly.

Most are calculated correctly but a few are not.

Thanks for helping me.

1

There are 1 best solutions below

0
On BEST ANSWER

As I have thought, it is a problem with the optimization method. When I have tried your irr function with the second df, I have even received a warning:

RuntimeWarning: The iteration is not making good progress, as measured by the 
  improvement from the last ten iterations.
  warnings.warn(msg, RuntimeWarning)

But trying out scipy.optimize.root with other methods seem to work for me. Changed the func to:

import scipy.optimize as optimize

def irr(cfs, yrs, x0):
    r = optimize.root(npv, args=(cfs, yrs), x0=x0, method='broyden1')
    return float(r.x)

I just checked lm and broyden1, and those both converged with your second example to around 0.216. There are multiple methods, and I have no clue which would be the best choice from those, but most seems to be better then the hybr method used in fsolve.