I have 1 minute data for an equity as follows;
bidopen bidhigh bidlow bidclose bidvolume
currencypair
2007-03-30 16:01:00 1.9687 1.96900 1.9686 1.9686 877.40
2007-03-30 16:02:00 1.9686 1.96905 1.9686 1.9686 897.20
2007-03-30 16:03:00 1.9686 1.96900 1.9686 1.9690 1076.11
2007-03-30 16:04:00 1.9689 1.96910 1.9688 1.9690 849.70
2007-03-30 16:05:00 1.9690 1.96900 1.9688 1.9689 1402.80
I want to add an extra column. This column will:
- take 15 records from this point onwards (including the current time)
- from those 15 records get the maximum bidhigh and the minimum bidlow
- calculate the difference of the high-low and use that value in the new column
I tried the following. Firstly I read the data in.
usecols = ['datetime','bidopen','bidhigh','bidlow','bidclose','bidvolume']
df=pd.read_csv(path,parse_dates=('datetime'),index_col=0, usecols = usecols )
define a function:
def lookaheadmaxmin(df):
df2=df[:15]
high=df2['bidhigh'].max(axis=1)
low=df2['bidlow'].min(axis=1)
return high-low
then
df['newcolumn'] = map( lookaheadmaxmin, df[:15])
This errors. I'm pretty sure the 'df[:15]' in the map is the problem as I don't know how to pass a slice of the current & future records to the function
Essentially what i'm trying to do is determine how much price has moved within a 15 minute moving window as follows:
So between; 16:00 - 16:15 - how much did price move? put this in the column on the 16:00 record
16:01 - 16:16 - how much did price move? put this in the column on the 16:01 record
16:02 - 16:17 - how much did price move? put this in the column on the 16:02 record
16:03 - 16:18 - how much did price move? put this in the column on the 16:03 record
16:04 - 16:19 - how much did price move? put this in the column on the 16:04 record
16:05 - 16:20 - how much did price move? put this in the column on the 16:05 record
Additional info:
I'm using Enthought Canopy Version 1.1.0 (64 bit) for Mac. Pandas version: Version: 0.12.0-1 (incorporating numpy 1.7.1)
Source data sample:
currencypair,datetime,bidopen,bidhigh,bidlow,bidclose,askopen,askhigh,asklow,askclose,bidvolume,askvolume
GBPUSD,2007-03-30 16:01:00,1.96870,1.96900,1.96860,1.96860,1.96850,1.96880,1.96845,1.96850,877.40,1386.70
GBPUSD,2007-03-30 16:02:00,1.96860,1.96905,1.96860,1.96860,1.96850,1.96890,1.96840,1.96840,897.20,1272.30
GBPUSD,2007-03-30 16:03:00,1.96860,1.96900,1.96860,1.96900,1.96850,1.96890,1.96840,1.96880,1076.11,1333.30
GBPUSD,2007-03-30 16:04:00,1.96890,1.96910,1.96880,1.96900,1.96880,1.96890,1.96865,1.96880,849.70,765.10
GBPUSD,2007-03-30 16:05:00,1.96900,1.96900,1.96880,1.96890,1.96875,1.96890,1.96860,1.96870,1402.80,1240.90
GBPUSD,2007-03-30 16:06:00,1.96890,1.96890,1.96840,1.96860,1.96870,1.96870,1.96820,1.96850,769.50,1727.30
GBPUSD,2007-03-30 16:07:00,1.96860,1.96880,1.96820,1.96830,1.96850,1.96870,1.96810,1.96820,842.00,1865.60
GBPUSD,2007-03-30 16:08:00,1.96830,1.96930,1.96830,1.96910,1.96820,1.96920,1.96820,1.96890,1096.60,1197.70
GBPUSD,2007-03-30 16:09:00,1.96910,1.96920,1.96880,1.96890,1.96895,1.96910,1.96865,1.96880,368.60,432.10
As a sidenote - there's something odd in the display of records ( I'm using ipython notebook ). Even though I ignore the 'currencypair'
column it oddly shows up as the column heading. (I'm including this as I don't know if it has a bearing on other things not working.)
Importing the data (using the above csv_read) (Note no 'currencypair'
column named)
usecols = ['datetime','bidopen','bidhigh','bidlow','bidclose','bidvolume']
df=pd.read_csv(path,parse_dates=('datetime'),index_col=0, usecols = usecols )
then doing
df[:5]
shows: (note it shows 'currencypair'
as the column heading but in df.info()
below it just shows as 'index'
)
bidopen bidhigh bidlow bidclose bidvolume
currencypair
2007-03-30 16:01:00 1.9687 1.96900 1.9686 1.9686 877.40
2007-03-30 16:02:00 1.9686 1.96905 1.9686 1.9686 897.20
2007-03-30 16:03:00 1.9686 1.96900 1.9686 1.9690 1076.11
2007-03-30 16:04:00 1.9689 1.96910 1.9688 1.9690 849.70
2007-03-30 16:05:00 1.9690 1.96900 1.9688 1.9689 1402.80
df.info()
shows:
<class 'pandas.core.frame.DataFrame'>
Index: 2362159 entries, 2007-03-30 16:01:00 to 2013-09-02 18:59:00
Data columns (total 5 columns):
bidopen 2362159 non-null values
bidhigh 2362159 non-null values
bidlow 2362159 non-null values
bidclose 2362159 non-null values
bidvolume 2362159 non-null values
dtypes: float64(5)
Importing the data an alternative way
Importing and then removing the currencypair column; (note addition of 'currencypair'
then dropping the column after)
usecols = ['currencypair','datetime','bidopen','bidhigh','bidlow','bidclose','bidvolume']
df=pd.read_csv(path,parse_dates=('datetime'),index_col=1, usecols = usecols )
df=df.drop('currencypair',1)
shows:
bidopen bidhigh bidlow bidclose bidvolume
datetime
2007-03-30 16:01:00 1.9687 1.96900 1.9686 1.9686 877.40
2007-03-30 16:02:00 1.9686 1.96905 1.9686 1.9686 897.20
2007-03-30 16:03:00 1.9686 1.96900 1.9686 1.9690 1076.11
2007-03-30 16:04:00 1.9689 1.96910 1.9688 1.9690 849.70
2007-03-30 16:05:00 1.9690 1.96900 1.9688 1.9689 1402.80
and df.info()
shows: (note index now shows as 'DatetimeIndex'
)
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2362159 entries, 2007-03-30 16:01:00 to 2013-09-02 18:59:00
Data columns (total 5 columns):
bidopen 2362159 non-null values
bidhigh 2362159 non-null values
bidlow 2362159 non-null values
bidclose 2362159 non-null values
bidvolume 2362159 non-null values
dtypes: float64(5)
Pretty straightforward when you only want a couple of columns which you are specifying e.g. max of a, and min of b for example
rolling functions compute from that point on, so we timeshift (which just changes the index) so that the values align (with the start point, rather than the end point)
Hi low diff is just
Seems you have gaps in your series, use
asfreq
: