Convert escape sequences from user input into their real representation

734 Views Asked by At

I'm trying to write an interpreter for LOLCODE that reads escaped strings from a file in the form:

VISIBLE "HAI \" WORLD!"

For which I wish to show an output of:

HAI " WORLD!

I have tried to dynamically generate a format string for printf in order to do this, but it seems that the escaping is done at the stage of declaration of a string literal.

In essence, what I am looking for is exactly the opposite of this question: Convert characters in a c string to their escape sequences

Is there any way to go about this?

1

There are 1 best solutions below

0
On BEST ANSWER

It's a pretty standard scanning exercise. Depending on how close you intend to be to the LOLCODE specification (which I can't seem to reach right now, so this is from memory), you've got a few ways to go.

Write a lexer by hand

It's not as hard as it sounds. You just want to analyze your input one character at a time, while maintaining a bit of context information. In your case, the important context consists of two flags:

  • one to remember you're currently lexing a string. It'll be set when reading " and cleared when reading ".
  • one to remember the previous character was an escape. It'll be set when reading \ and cleared when reading the character after that, no matter what it is.

Then the general algorithm looks like: (pseudocode)

loop on: c ← read next character
  if not inString 
    if c is '"' then clear buf; set inString
    else [out of scope here]
  if inEscape then append c to buf; clear inEscape
  if c is '"' then return buf as result; clear inString
  if c is '\' then set inEscape
  else append c to buf

You might want to refine the inEscape case should you want to implement \r, \n and the like.

Use a lexer generator

The traditional tools here are lex and flex.

Get inspiration

You're not the first one to write a LOLCODE interpreter. There's nothing wrong with peeking at how the others did it. For example, here's the string parsing code from lci.