I am trying to find a way to implement this in python: CyberChef Example
I want to convert from UTF-8 to UTF-7.
For example, given the string
<root><test>aaa</test><hel>asd</hel></root>
The encoded output should be the bytes
+ADw-root+AD4-+ADw-test+AD4-aaa+ADw-/test+AD4-+ADw-hel+AD4-asd+ADw-/hel+AD4-+ADw-/root+AD4-
I tried using codec and decode('utf-7') but this just returns the string as is:
>>> "<root><test>aaa</test><hel>asd</hel></root>".encode("utf-7")
b'<root><test>aaa</test><hel>asd</hel></root>'
<
and>
are among what UTF-7 (RFC 2152) calls "optional direct characters". These characters allowed to be either directly encoded as their ASCII equivalents, or be encoded using a Unicode shifted encoding. Python's UTF-7 implementation chooses to use direct encodings for all optional direct characters.For example, when encoding to UTF-7, Python maps
<
to its single-byte ASCII representationb'<'
:When decoding bytes from UTF-7, Python will happily handle either a direct encoding or Unicode shifted encoding for
<
:If you want to obtain a Unicode shift encoding for
<
(or for other optional direct characters), you'll need to either a) manually convert the direct-encoded characters to their Unicode shift equivalents, or b) use a different UTF-7 implementation that provides more fine-grained encoding controls.To manually convert
<
and>
from direct encodings to Unicode shift encodings, you can usebytes.replace
: