I want to winsorize my data, which looks like following (in total 134 observations):
company id rev size age
1 Adeg 29.9 0.66 160 45
2 Agrana 32.0 2.80 9191 29
3 Allianz 36.5 87.75 142460 128
4 Andritz 34.0 6.89 29096 118
5 Apple 41.0 259.65 132000 41
To use the winsorize
function from DescTools
package, I created a single numeric vector of variable rev
, by simply using the select
function: rev_vector <- select(data1, -...)
I then ran the function as following, which gives me an error:
> Winsorize(rev_vector)
Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = decreasing)) :
undefined columns selected
Is this caused since i implement a data.frame
instead of a vector?
Alternatively, I tried the following:
> Winsorize(rev_vector$rev, probs = c(0.05, 0.95))
[1] 0.66 2.80 87.75 6.89 134.73 0.09 22.78 1.36 5.48 0.70 0.79 0.35 31.37 0.55 0.94 0.06
[17] 12.36 13.58 7.95 0.29 7.80 0.39 73.55 0.09 23.07 0.27 0.32 0.08 0.05 0.41 29.47 0.66
[33] 20.91 0.67 0.05 1.39 0.17 0.14 1.79 0.05 2.52 3.68 0.24 0.09 109.65 8.43 0.20 0.17
[49] 35.93 3.05 0.07 0.05 0.82 0.57 26.21 0.28 0.05 5.72 6.12 4.09 0.05 0.22 134.73 94.43
[65] 41.35 0.20 17.32 5.63 3.25 0.12 0.05 0.07 10.89 3.79 1.89 134.73 9.98 10.58 54.98 134.73
[81] 15.55 15.21 5.93 42.65 1.59 3.00 11.19 6.10 0.08 134.73 31.37 17.74 20.92 6.46 3.18 0.05
[97] 0.81 9.15 29.47 0.05 1.34 7.97 109.65 28.45 35.93 0.38 0.65 134.73 9.44 8.66 5.30 11.83
[113] 20.06 29.55 1.15 2.32 46.14 134.73 9.98 10.58 11.05 54.98 134.73 15.55 15.21 5.93 1.59 1.03
[129] 3.00 11.19 6.10
I am not sure about what the outcome means? Since I don't think that the winsorize actually worked when looking at the summary of the vector: summary(rev_vector$rev)
, it is unchanged to the one previous winsorizing.
Can somebody help me out here? Thanks!
You are almost there, only that you chose restrictive probs for the quantiles. Your vector has already a considerable number of equal values at its edges. Has it perhaps already been winsorized before?
summary()
is in this case somewhat coarse.Using
Desc()
gives you a more detailed idea what's going on in your data.You see, that you have 9 times the value 0.05 and 8 times the value 134.73. So the quantiles with probs 0.05 and 0.95 are the same as the extremes and the winsorized vector remains the same as the original one.
Simply increase the probs to say c(0.1, 0.9) and you'll see the effect.
PS:
Winsorize()
needs a vector as argument and can't handle data.frames. (This is also so described in the help file…)PPS: a reproducible example would help… ;-)