The charts below show my basic challenge: subtract NUMBER OF STOCKS WITH DATA END from NUMBER OF STOCKS WITH DATA START. The challenge I am having is that the date range for each series does not match so I need to merge both sets to a common date range, perform the subtraction, and save results to a new comma seperated value file.
Input data in file named 'meta.csv' contains 3187 lines. Fields per line are data for ticker, start, & end. Head and tail as shown here:
0000 ticker,start,end 0001 A,1999-11-18,2016-12-27 0002 AA,2016-11-01,2016-12-27 0003 AAL,2005-09-27,2016-12-27 0004 AAMC,2012-12-13,2016-12-27 0005 AAN,1984-09-07,2016-12-27 ... 3183 ZNGA,2011-12-16,2016-12-27 3184 ZOES,2014-04-11,2016-12-27 3185 ZQK,1990-03-26,2015-09-09 3186 ZTS,2013-02-01,2016-12-27 3187 ZUMZ,2005-05-06,2016-12-27
Python code and console output:
import pandas as pd
df = pd.read_csv('meta.csv')
s = df.groupby('start').size().cumsum()
e = df.groupby('end').size().cumsum()
#s.plot(title='NUMBER OF STOCKS WITH DATA START',
# grid=True,style='k.')
#e.plot(title='NUMBER OF STOCKS WITH DATA END',
# grid=True,style='k.')
print(s.head(5))
print(s.tail(5))
print(e.tail(5))
OUT:
start
1962-01-02 11
1962-11-19 12
1970-01-02 30
1971-08-06 31
1972-06-01 54
dtype: int64
start
2016-07-05 3182
2016-10-04 3183
2016-11-01 3184
2016-12-05 3185
2016-12-08 3186
end
2016-12-08 544
2016-12-15 545
2016-12-16 546
2016-12-21 547
2016-12-27 3186
dtype: int64
Chart output when comments removed for code shown above:
I want to create one population file with the date and number of stocks with active data which should have a head and tail shown as follows:
date,num_stocks 1962-01-02,11 1962-11-19,12 1970-01-02,30 1971-08-06,31 1972-06-01,54 ... 2016-12-08,2642 2016-12-15,2641 2016-12-16,2640 2016-12-21,2639 2016-12-27,2639
The ultimate goal is to be able to plot the number of stocks with data over any specified date range by reading the population file.
Thanks to the crucial tip provided by piRSquared I solved the challenge using this code:
'pop.csv' file head/tail:
Chart: