I took two time series data with 141 data points in total with time stamps. i found out actual correlation between them which is about 0.97. Now i find the Hayashi Yoshida estimator for correlation. It comes greater than 3. The Hy correlation estimator should have given the correlation close to actual correlation.
Though this HY correlation estimator is not bounded from-1 to 1 like actual correlation, still should not give it a better estimate? Is my data set too small?
The Hayashi yohida correlation estimator is given in http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2225753
as
data is :
1,100,62
2,100.5,62
3,100.6,62
4,100.6,62.05
5,100.6,62.1
6,100.6,62.15
7,100.6,62.2
8,100.6,62.25
9,100.6,62.3
10,100.6,62.35
11,100.6,62.4
12,100.6,62.45
13,100.6,62.5
14,100.6,62.55
15,100.6,62.6
16,101.1,62.6
17,101.2,62.6
18,101.2,62.65
19,101.2,62.7
20,101.2,62.75
21,101.2,62.8
22,101.2,62.85
23,101.2,62.9
24,101.2,62.95
25,101.2,63
26,101.2,63.05
27,101.2,63.1
28,101.2,63.15
29,101.2,63.2
30,101.7,63.2
31,101.8,63.2
32,101.8,63.25
33,101.8,63.3
34,101.8,63.35
35,101.8,63.4
36,101.8,63.45
37,101.8,63.5
38,101.8,63.55
39,101.8,63.6
40,101.8,63.65
41,101.8,63.7
42,101.8,63.75
43,101.8,63.8
44,102.3,63.8
45,102.4,63.8
46,102.4,63.85
47,102.4,63.9
48,102.4,63.95
49,102.4,64
50,102.4,64.05
51,102.4,64.1
52,102.4,64.15
53,102.4,64.2
54,102.4,64.25
55,102.4,64.3
56,102.4,64.35
57,102.4,64.4
58,102.9,64.4
59,103,64.4
60,103,64.45
61,103,64.5
62,103,64.55
63,103,64.6
64,103,64.65
65,103,64.7
66,103,64.75
67,103,64.8
68,103,64.85
69,103,64.9
70,103,64.95
71,103,65
72,103.5,65
73,103.6,65
74,103.6,65.05
75,103.6,65.1
76,103.6,65.15
77,103.6,65.2
78,103.6,65.25
79,103.6,65.3
80,103.6,65.35
81,103.6,65.4
82,103.6,65.45
83,103.6,65.5
84,103.6,65.55
85,103.6,65.6
86,104.1,65.6
87,104.2,65.6
88,104.2,65.65
89,104.2,65.7
90,104.2,65.75
91,104.2,65.8
92,104.2,65.85
93,104.2,65.9
94,104.2,65.95
95,104.2,66
96,104.2,66.05
97,104.2,66.1
98,104.2,66.15
99,104.2,66.2
100,104.7,66.2
101,104.8,66.2
102,104.8,66.25
103,104.8,66.3
104,104.8,66.35
105,104.8,66.4
106,104.8,66.45
107,104.8,66.5
108,104.8,66.55
109,104.8,66.6
110,104.8,66.65
111,104.8,66.7
112,104.8,66.75
113,104.8,66.8
114,105.3,66.8
115,105.4,66.8
116,105.4,66.85
117,105.4,66.9
118,105.4,66.95
119,105.4,67
120,105.4,67.05
121,105.4,67.1
122,105.4,67.15
123,105.4,67.2
124,105.4,67.25
125,105.4,67.3
126,105.4,67.35
127,105.4,67.4
128,105.9,67.4
129,106,67.4
130,106,67.45
131,106,67.5
132,106,67.55
133,106,67.6
134,106,67.65
135,106,67.7
136,106,67.75
137,106,67.8
138,106,67.85
139,106,67.9
140,106,67.95
141,106,68
Those repeated values look like an ex-post constant interpolation to fill "missing" data. The same for the second series, it looks like there's not much real data, just linear interpolation (I cannot make much sense of the triplets though). If it is so that might be the problem, you should instead (at least) calculate realized variance (not daily variance) of returns over the actual data, not the artificial interpolateion. Like it is now it will give an artificially low variance which explains the excessive correlation.
To test H-Y you'd better generate two correlated brownian paths with normal increments, and then pick a few values at random for each series. For a much longer time interval first.