Finding the mode of an int list in SML and where it occurs without library functions

295 Views Asked by At

I'm trying to find the mode or value that occurs most frequently. I want a function like :

mode:' 'a list -> (''a * int) list

and it returns the mode and where it occurs, unless there is a tie then return all occurrences so something like:

mode([1,1,2,3,5,8]) ===> [(1,2)]
mode([1,3,5,2,3,5]) ===> [(3,2),(5,2)]
mode([true,false,true,true]) ====>[(true,3)]

I'm trying to do this without library functions in SML.

so far I got:

fun mode(L)=
if null L then nil
else if hd L= hd (tl L) then 1+mode(hd(tl L))
else mode(tl L);

I know this isn't right I guess I am curious on how you both keep the indices of where the mode occurs and what the mode is and return them as tuples in a list.

1

There are 1 best solutions below

0
sshine On BEST ANSWER

You're trying to solve an exercise in many parts with several easier exercises before it. Judging by your current progress, have you considered solving some very similar exercises that build up to the final goal? This is generally good advice when solving programming problems: Reduce your current problem to simpler problems and solve those.

I'd try and solve this problem first

  • Build a histogram : ''a list -> (''a * int) list over the elements of a list:

    fun histogram [] = ...
      | histogram (x::xs) = ...
    

    Do this by inserting each x with its count into a histogram.

    fun insert (x, []) = ...
      | insert (x, (y, count) :: hist) = ...
    

    And write some tests that you can execute once in a while.

  • Find the mode : ''a list -> ''a of a list:

    fun mode xs = ... (histogram xs)
    

    Do this by finding the element in the histogram with the biggest count:

    fun findMax [] = ... (* what if the list/histogram is empty? *)
      | findMax [(x, count)] = ...
      | findMax ((x, count) :: (y, count) :: hist) = ...
    

and eventually try and solve this problem

  • When you have a good grasp of representing and navigating regular histograms recursively, you could create an annotated histogram : (''a * int * int list) list that doesn't just contain the frequency of each element, but also their positions in the input list:

    fun histogram_helper ([], _) = ...
      | histogram_helper (x::xs, i) = ... histogram_helper (xs, i+1) ...
    

    Do this by inserting each x with its count and position i along with previously found positions is into a histogram:

    fun insert (x, i, []) = ...
      | insert (x, i, (y, count, is) :: hist) = ...
    
  • Find the (possibly multiple) mode : ''a list -> (''a * int list) list of a list:

    fun mode xs = ... (histogram xs)
    

    Do this by finding the (possibly multiple) element(s) in the histogram with the biggest count:

    fun findMax ([],                     countMax, tmpModes) = ...
      | findMax ((x, count, is) :: hist, countMax, tmpModes) = ...
    

    with countMax : int being the frequency repeated in tmpModes : (''a * int * int list) list. Here countMax and tmpModes are accumulating result parameters. Do this by determining whether (x, count, is) should be thrown away in favor of all tmpModes, or it should be added to tmpModes, or it should be chosen in favor of all tmpNodes

    I am curious on how you both keep the indices of where the mode occurs and what the mode is and return them as tuples in a list.

    Yes, this is not trivial. Using my suggested division into sub-problems, answering this depends on whether we are in the histogram function or the findMax function:

    In histogram you can store the indices as part of the tuple that contains the element and the frequency. In findMax, since you're potentially collecting multiple results, you need to keep track of both which frequency is the highest (countMax) and what the temporary modes of choice are (tmpModes); subject to replacement or addition in a later recursive call.

    So to answer your question: In an accumulating parameter.


and a little feedback to your code snippet

fun mode(L)=
if null L then nil
else if hd L= hd (tl L) then 1+mode(hd(tl L))
else mode(tl L);

Use pattern matching instead of null, hd and tl:

fun count_4s [] = 0
  | count_4s (x::xs) = (if x = 4 then 1 else 0) + count_4s xs

fun count_ns ([],    _) = 0
  | count_ns (x::xs, n) = (if x = n then 1 else 0) + count_ns (xs, n)

fun count_12 ([], ones, twos) = (ones, twos)
  | count_12 (x::xs, ones, twos) =
      if x = 1 then count_12 (xs, ones+1, twos) else
      if x = 2 then count_12 (xs, ones, twos+1) else
      count_12 (xs, ones, twos)

fun count_abc ([], result) = result
  | count_abc (x::xs, ((a, ca), (b, cb), (c, cc))) =
      count_abc (xs, if x = a then ((a, ca+1), (b, cb), (c, cc)) else
                     if x = b then ((a, ca), (b, cb+1), (c, cc)) else
                     if x = c then ((a, ca), (b, cb), (c, cc+1)) else
                     ((a, ca), (b, cb), (c, cc)))

Building a histogram is sort of an extension to this where instead of a fixed value like 4, or a fixed amount of them like ones and twos, you have a whole list of them, and you have to dynamically look for the one you've got, x, and determine if it needs to be added to the histogram or incremented in the histogram.

The best way would be to do that in a helper function, so for example, if count_abc were made with a helper function,

fun insert_abc (x, ((a, ca), (b, cb), (c, cc))) =
    if x = a then ((a, ca+1), (b, cb), (c, cc)) else
    if x = b then ((a, ca), (b, cb+1), (c, cc)) else
    if x = c then ((a, ca), (b, cb), (c, cc+1)) else
    ((a, ca), (b, cb), (c, cc)))

fun count_abc ([], result) = result
  | count_abc (x::xs, result) =
      count_abc (xs, insert (x, result))

only instead of the histogram representation

(''a * int) * (''a * int) * (''a * int)

you want

(''a * int) list

and insert should be recursive rather than how insert_abc is repetitive.