For an online algorithms course I am attempting to write a program which calculates the travelling salesman distance of cities using an approxomation algorithm:
- Start the tour at the first city.
- Repeatedly visit the closest city that the tour hasn't visited yet. In case of a tie, go to the closest city with the lowest index. For example, if both the third and fifth cities have the same distance from the first city (and are closer than any other city), then the tour should begin by going from the first city to the third city.
- Once every city has been visited exactly once, return to the first city to complete the tour.
I am trying to write a solution in Haskell, and I have it working on small data sets, but it runs out of memory on a large input (the course has an input of ~33000 cities)
-- Fold data: cities map, distances map, visited map, list of visited cities and each distance,
-- and current city
data TS = TS (M.Map Int City) (M.Map (Int,Int) Double) (M.Map Int Bool) ([(Int,Double)]) (Int)
run :: String -> String
run input = let cm = parseInput input -- cityMap contains cities (index,xPos,yPos)
n = length $ M.keys cm
dm = buildDistMap cm -- distanceMap :: M.Map (Int,Int) Double
-- which is the distance between cities a and b
ts = TS cm dm (M.fromList [(1,True)]) [(1,0.0)] 1
(TS _ _ _ beforeLast _) = foldl' (\ts _ -> exec ts n) ts [2..n]
completed = end beforeLast dm
in show $ floor $ sum $ map (\(_,d) -> d) $ completed
exec :: TS -> Int -> TS
exec (TS cm dm visited ordered curr) n =
let candidateIndexes = [(i)|i<-[1..n],M.member i visited == False]
candidates = map (\i -> let (Just x) = M.lookup (curr,i) dm in (x,i)) candidateIndexes
(dist,best) = head $ sortBy bestCity candidates
visited' = M.insert best True visited
ordered' = (best,dist) : ordered
in TS cm dm visited' ordered' best
end :: [(Int,Double)] -> M.Map (Int,Int) Double -> [(Int,Double)]
end ordering dm = let (latest,_) = head ordering
(Just dist) = M.lookup (latest,1) dm
in (1,dist) : ordering
bestCity :: (Double,Int) -> (Double,Int) -> Ordering
bestCity (d1,i1) (d2,i2) =
if compare d1 d2 == EQ
then compare i1 i2
else compare d1 d2
At first I wrote the function exec
as a recursive function instead of calling it via the foldl'
. I thought changing it to use foldl'
would solve my issue as foldl'
is strict. However it appears to have made no difference in memory usage. I have tried compiling my program using no optimisations and -O2
optimisations.
I know that somehow it must be keeping multiple loops in memory as I can sort 34000 numbers without issue using
> sort $ [34000,33999..1]
What exactly am I doing wrong here?
Just in case it is any use here is the parseInput
and buildDistMap
functions, but they are not the source of my issue
data City = City Int Double Double deriving (Show, Eq)
-- Init
parseInput :: String -> M.Map Int City
parseInput input =
M.fromList
$ zip [1..]
$ map ((\(i:x:y:_) -> City (read i) (read x) (read y)) . words)
$ tail
$ lines input
buildDistMap :: M.Map Int City -> M.Map (Int,Int) Double
buildDistMap cm =
let n = length $ M.keys cm
bm = M.fromList $ zip [(i,i)|i<-[1..n]] (repeat 0) :: M.Map (Int,Int) Double
perms = [(x,y)|x<-[1..n],y<-[1..n],x/=y]
in foldl' (\dm (x,y) -> M.insert (x,y) (getDist cm dm (x,y)) dm) bm perms
getDist :: M.Map Int City -> M.Map (Int,Int) Double -> (Int,Int) -> Double
getDist cm dm (x,y) =
case M.lookup (y,x) dm
of (Just v) -> v
Nothing -> let (Just (City _ x1 y1)) = M.lookup x cm
(Just (City _ x2 y2)) = M.lookup y cm
in eDist (x1,y1) (x2,y2)
eDist :: (Double,Double) -> (Double,Double) -> Double
eDist (x1,y1) (x2,y2) = sqrt $ p2 (x2 - x1) + p2 (y2 - y1)
where p2 x = x ^ 2
And some test input
tc1 = "6\n\
\1 2 1\n\
\2 4 0\n\
\3 2 0\n\
\4 0 0\n\
\5 4 3\n\
\6 0 3"
That
foldl'
is doing a lot less than you hope. It causes theTS
constructor to be evaluated at every step, but nothing in that evaluation process requiresvisited'
,ordered'
, orbest
to be evaluated. (cm
anddm
aren't modified in the loop, so they can't stack up unevaluated thunks.)The best way to solve this is to make the evaluation of the
TS
constructor returned byexec
depend on evaluatingvisited'
,ordered'
, andbest
sufficiently.M.Map
is always spine-strict, so evaluating a map at all means the whole structure is evaluated. Whether the values are as well depends on how you imported it, but that turns out to not be relevant here. The value you're inserting is a nullary constructor, so it's already fully evaluated. So evaluatingvisited'
to WHNF is sufficient.Int
is not a nested type, so evaluatingbest
to WHNF is sufficient.[(Int, Double)]
(the outer parens are redundant, the list brackets do grouping of their contents) is a bit trickier. Lists aren't spine strict, nor are pairs strict. But looking at the construction pattern, this is a prepend-only structure. As such, you don't need to worry about the tail. If the list was evaluated coming in, the output will be evaluated as long as the new head is. Unfortunately, that means you've got to be a bit careful with the pair. Half of it is the samebest
value as constructed above, so that's not too bad. If it's evaluated, it's evaluated! (Though this does suggest you don't need to be passing it at every iteration, you could just use the front ofordered
.) The other half of the pair is aDouble
, which is also non-nested, so WHNF is sufficient for it.In this particular case, due to the fact that different approaches are necessary, I'd probably just approach this with
seq
.Note that I'm being careful to force the minimal number of values to remove unnecessary nesting of thunks. The
(,)
and(:)
constructors don't need to be evaluated, only their arguments - the place where nested thunks might build up. (What's the difference in memory consumption between<thunk <expression> <expression>>
and<constructor <expression> <expression>>
?)