How do I properly escape and unescape a multiline string that contains newline literals?

26.5k Views Asked by At

I'm working on a Visual Studio Code extension. The extension is supposed to act on the text that is currently selected in the editor window and send it to an external command (lein-cljfmt in my case, but I think that's unrelated to my question). When the external command is done processing the text I want to replace the current editor selector with the result returned from the command line tool.

Before sending the string I escape it like this:

contents
    .replace(/\\/g, '\\\\')
    .replace(/"/g, '\\"')
    .replace(/\n/g, '\\n');

The result in being unescaped like:

contents
    .replace(/\\n/g, '\n')
    .replace(/\\"/g, '"')
    .replace(/\\\\/g, '\\');

This works in all but one case: when the selection that is being processed contains a string literal that contains a newline literal, the unescaping will instead turn this into a linebreak, thus breaking the code in the editor.

This is an example of a snippet that breaks my escaping:

(defn join
  [a b]
  (str a "\n" b)) 

I tried quite some regexp black magic like

.replace(/(?!\B"[^"]*)\\n(?![^"]*"\B)/g, '\n')

by now, but couldn't find a solution that does not have edge cases. Is there a way to do this that I am missing? I also wonder if there is a VSCode extension API that could handle that as it seems to be a common scenario to me.

1

There are 1 best solutions below

1
On BEST ANSWER

I think this might be what you need:

function slashEscape(contents) {
    return contents
        .replace(/\\/g, '\\\\')
        .replace(/"/g, '\\"')
        .replace(/\n/g, '\\n');
}

var replacements = {'\\\\': '\\', '\\n': '\n', '\\"': '"'};

function slashUnescape(contents) {
    return contents.replace(/\\(\\|n|")/g, function(replace) {
        return replacements[replace];
    });
}

var tests = [
    '\\', '\\\\', '\n', '\\n', '\\\n', '\\\\n',
    '\\\\\n', '\\\\\\n', '\\"\\\\n', '\n\n',
    '\n\n\n', '\\n\n', '\n\\n', '\\n\\n',
    '\\\n\\n\nn\n\\n\\\n\\\\n', '"', '\\"', '\\\\"'
];

tests.forEach(function(str) {
    var out = slashUnescape(slashEscape(str));
    
    // assert that what goes in is what comes out
    console.log(str === out, '[' + str + ']', '[' + out + ']');
});

Trying to unescape the string in 3 stages is really tricky because \n has a different meaning depending on how many slashes there are just before it. In your example the original string of \n (slash n) gets encoded as \\n (slash slash n), then when you decode it the last two characters match the first of your RegExps when what you want is for the first two characters to match the third RegExp. You've got to count the slashes to be sure. Doing it all in one go dodges that problem by decoding those leading slashes at the same time.