Why might the Scheme `filter` form not process list elements 'in order'?

104 Views Asked by At

(filter procedure list) applies procedure to each element of list and returns a new list containing only the elements for which procedure returns true.
(R. Kent Dybvig The Scheme Programming Language) (online)

What may not be apparent from this description is that, while the elements in the returned list occur in the same order as in list, the order of calls of procedure is not specified in R6RS. (Racket, however, applies the procedure "to each element from first to last")

A recently active answer mentions that it requires a filterfunc which works over its argument list in order. How should one write this function?

An answer with my explanation of the issue is supplied.

2

There are 2 best solutions below

0
On

It involves you should not use mutation of some external variable inside procedure and it supposes the implementation may apply parallelism, for example map-reduce.

1
On

What order might a Scheme implementation use, and why? Let's try it (all code run in Chez Scheme REPL):

  1. Display order of applications
> (filter (lambda (x) (display x) (even? x))
    '(0 1 2 3 4 5)))
452301(0 2 4)
> 
  1. Why this order?
    R6RS implementations must check that list is a proper list
  • ends with the empty list:
> (filter (lambda (x) (display x) (even? x))
    '(0 1 2 . 3)))
Exception in filter: (0 1 2 . 3) is not a proper list
> 
  • no circularity:
> (define xs '(0 1 2 3))
> (set-cdr! (cdddr xs) (cdr xs))
> (filter (lambda (x) (display x) (even? x)) xs)
Exception in filter: (0 1 2 3 1 2 ...) is not a proper list
> 
  • the implementation of filter in Chez Scheme which checks these requirements while filtering, resulting in the "452301" order of predicate applications, can be seen here
    (lines 589-617: a version is included as a spoiler below as an alternative to scrolling through the source on github)
  1. What?
    The Chez Scheme code uses a "tortoise and hare" algorithm to detect cycles. Without error checks the code could be:
> (let ([filter
    (lambda (pred? ls)
      (let f ([fast ls])
        (if (pair? fast)
          (let ([rest (f (cdr fast))])
            (if (pred? (car fast))
              (cons (car fast) rest)
              rest))
          '())))])
  
    (filter (lambda (x) (display x) (even? x))
      '(0 1 2 3 4 5)))
543210(0 2 4)
> 

(the identifier fast is used to match the Chez Scheme code: could be ls otherwise)

  1. How can one change this to call pred? "from first to last"?
  • replace the rest variable with an accumulator (compare the following with 3 above; the changes are small, but filtered elements are consed to acc, so it has to be reversed to give the result):
> (let ([filter
    (lambda (pred? ls)
      (let f ([fast ls] [acc '()])
        (if (pair? fast)
          (f (cdr fast)
             (if (pred? (car fast))
               (cons (car fast) acc)
               acc))
          (reverse acc))))])
  
    (filter (lambda (x) (display x) (even? x))
      '(0 1 2 3 4 5)))
012345(0 2 4)
> 

So 4 could be used as the required filterfunc. It would be an interesting exercise to add error checks like the Chez Scheme implementation, which is effectively:

(define (filter pred? ls)
(unless (procedure? pred?)
(error #f "not a procedure" pred?))
(or (let f ((pred? pred?) (fast ls) (slow ls))
(if (pair? fast)
(let ((fast1 (cdr fast)))
(if (pair? fast1)
(and (not (eq? fast1 slow))
(let ((fast2 (cdr fast1)))
(let ((rest (f pred? fast2 (cdr slow))))
(and rest
(if (pred? (car fast))
(if (pred? (car fast1))
(if (eq? rest fast2)
fast
(list* (car fast)
(car fast1) rest))
(cons (car fast) rest))
(if (pred? (car fast1))
(if (eq? rest fast2)
fast1
(cons (car fast1) rest))
rest))))))
(and (null? fast1)
(if (pred? (car fast))
fast
'()))))
(and (null? fast) '())))
(error #f "circular/improper list" ls)))