Portable way to change Prolog atom JSON escaping

113 Views Asked by At

Is there a portable way to change the Prolog escaping. I have the following in mind, usualy an atom is escaped as follows, for example using octal escaping:

/* SWI-Prolog 8.3.23 */

?- X = 'abc\x0001\def'.
X = 'abc\001\def'.

But what I want to archive, is an output as follows. If a compound '$STR'/1 is used, the escape should be "\uXXXX" instead of octal escaping and double quotes:

?- X = '$STR'('abc\x0001\def').
X = "abc\u0001def".

Do Prolog systems have some hook, like portray, that could do that? I do not expect approaches based on ISO core standard, already "\uXXXX" isn't ISO core standard

1

There are 1 best solutions below

0
On

A portable way would be to use write/1 instead of writeq/1, and convert the atom itself before using write/1. A conversion code can use the ISO core standard atom_codes/2 as follows:

% escape_atom(+Atom, -Atom)
escape_atom(X, Y) :-
   atom_codes(X, L),
   escape_codes(L, R, [0'"]),
   atom_codes(Y, [0'"|R]).

The predicate escape_codes/2 itself can be easily implemented with DCG, which is also available for many Prolog systems. Surrogate pairs can be extracted as follows, using standard arithmetic:

% high_surrogate(+Integer, -Integer)
high_surrogate(X, Y) :- Y is (X >> 10) + 0xD7C0.

% low_surrogate(+Integer, -Integer)
low_surrogate(X, Y) :- Y is (X /\ 0x3FF) + 0xDC00.

Here is an example run of the routine:

?- escape_atom('abc\x1000c\def', X), write(X), nl.
"abc\ud800\udc0cdef"
X = '"abc\\ud800\\udc0cdef"'
?- escape_atom('abc\x1000b\def', X), write(X), nl.
"abcdef"
X = '"abcdef"'

The only missing piece is a built-in code_type/2 that is supposed to deliver the unicode general category of a code point. This is then used to identify control codes and invalid codes,

only these are escaped.

Open Source: Prolog escape.pl
https://gist.github.com/jburse/bf6c01c7524f2611d606cb88983da9d6#file-escape-pl