Parsing strings with Scheme

3k Views Asked by At

I am trying to write a simple parser which creates a sxml-expression from a string, e. g.

"This is a [Test]" ===> (item "This is a" (subitem "Test"))

Anybody who is wondering about the square brackets within the given example may have a look at the so called Leiden conventions.

This is the code I have written so far:

(define my-sequence '("this" "[" "is" "a" "]" "test"))

(define (left-square-bracket? item)
  (or (equal? item "[")
      (eq? item #\x005b)))

(define (right-square-bracket? item)
  (or (equal? item "]")
      (eq? item #\x005d)))

(define (parse-sequence sequence)
  (cond ((null? sequence) '())
        ((left-square-bracket? (car sequence))
         (let ((subsequence (get-subsequence (cdr sequence))))
           (list subsequence)))
        (else
         (cons (car sequence)
               (parse-sequence (cdr sequence))))))

(define (get-subsequence sequence)
  (if (right-square-bracket? (car sequence))
      '()
      (cons (car sequence)
            (get-subsequence (cdr sequence)))))

Evaluating (parse-sequence my-sequence) yields ("this" ("is" "a")). A nested expression has been created, but the program finished without having evaluated the last item "test". The question is, how do I return from get-subsequence to parse-sequence?

Any help is appreciated, many thanks in advance! :)

2

There are 2 best solutions below

3
On BEST ANSWER

To address your initial questions, how to return multiple values: use the "values" form. Here is an example implementation where the inner procedure returns both the remaining list to be processed and the result so far. It recurses on opening brackets.

(define (parse-sequence lst)

  (define (parse-seq lst)
    (let loop ((lst lst) (res null))
      (cond
        ((null? lst) (values null res))
        ((string=? (car lst) "[")
         (let-values ([(lst2 res2) (parse-seq (cdr lst))])
           (loop lst2 (append res (list res2)))))
        ((string=? (car lst) "]")
         (values (cdr lst) res))
        (else
          (loop (cdr lst) (append res (list (car lst))))))))

  (let-values ([(lst res) (parse-seq lst)])
    res))

then

(parse-sequence '("this" "is" "a" "test"))
(parse-sequence '("this" "[" "is" "a" "]" "test"))
(parse-sequence '("this" "[" "is" "[" "a" "]" "]" "test"))

will yield

'("this" "is" "a" "test")
'("this" ("is" "a") "test")
'("this" ("is" ("a")) "test")
0
On

I made some progress by using open-input-string in combination with read-char:

(define my-sequence (open-input-string "this [is a] test"))

(define (parse-sequence sequence)
  `(item
    ,@(let loop ((next-char (read-char sequence)))
        (cond ((eof-object? next-char) '())
              ((left-square-bracket? next-char)
               (let ((subsequence (get-subsequence sequence)))
                 (cons subsequence
                       (loop (read-char sequence)))))
              (else
               (cons next-char
                     (loop (read-char sequence))))))))

(define (get-subsequence sequence)
  `(subitem
    ,@(let loop ((next-char (read-char sequence)))
        (if (right-square-bracket? next-char)
            '()
            (cons next-char
                  (loop (read-char sequence)))))))

(parse-sequence my-sequence)
===> (item #\t #\h #\i #\s #\space (subitem #\i #\s #\space #\a) #\space #\t #\e #\s #\t)

Now work goes on, step by step. :)

Any comments and suggestions are still appreciated. :)