I have to carry out a challenge that involves the elaboration of an algorithm to compute the Pareto (set) boundary. The statement is basically:
Given a set S of n points in the square [0,1] x [0,1], make an algorithm to determine the subset P contained in S, formed by the non-dominated points of S.
It is also said that it is easy to elaborate an algorithm of the order n*n point comparisons that accomplish this purpose. Well I came up with an algorithm by researching here and there. The challenge is still to implement an algorithm of the order n*log(n). How do I get the order of these algorithms?
Thanks in advance!
#data
set.seed(103)
x = runif(200)
y = runif(200)
#algorithm
pareto = 1:length(x)
for(i in 1:length(x)){
cond1 = y[i]!=min(y[which(x==x[i])])
cond2 = x[i]!=min(x[which(y==y[i])])
for(k in 1:length(x)){
if((x[i]>x[k] & y[i]>y[k]) | (x[i]==x[k] & cond1) | (y[i]==y[k] & cond2)){
pareto[i] = NA
break
}
}
}
xPareto = x[pareto[!is.na(pareto)]]
yPareto = y[pareto[!is.na(pareto)]]
#graphic:
plot(x, y)
points(xPareto, yPareto, col=2, pch=16)
dat = data.frame(x=xPareto, y=yPareto)
dat = dat[order(dat$x),]
for(i in 1:(nrow(dat)-1)){
segments(dat$x[i], dat$y[i], dat$x[i+1], dat$y[i+1], col=2, lty=2)
}
The intuition behind the efficient greedy solution to this problem lies in the fact that a point
i
is dominated by pointj
iffx[i] > x[j]
andy[i] > y[j]
, which implies thatj
must come beforei
when the points are ordered by either coordinate. Hence, if we traverse the points in increasing order of their x-coordinates, then the pointj
(if any) that dominates pointi
must have been traversed before pointi
is traversed. In other words, it is impossible for the dominating pointj
to come after the dominated pointi
in this ordering.Thus, with this traversal order the domination problem (i.e. checking if a point is dominated by some other point) boils down to checking if we have already seen a point with a lower y-coordinate as the traversal order already enforces the x-coordinate condition. This can easily be done by checking each point's y-coordinate to the lowest (minimum) y-coordinate we have seen so far -- if the minimum y-coordinate is less than the current point
i
's y-coordinate then the pointj
with the minimum y-coordinate dominatesi
asx[j] < x[i]
becausej
was seen beforei
.Sorting by the x-coordinate takes
O(n log n)
time and checking each point (while maintaining the partial minimum y-coordinate) takesO(n)
time, making the entire algorithm takeO(n log n)
time.