Encode character to HTML in R, the CRAN way

429 Views Asked by At

Before voting for close as duplicate please ensure that it does actually answer my particular question here. Questions may look similar, but I haven't found an answer for mine. Thank you.


I am looking for a way to convert arbitrary scalar character into its HTML encoded form. I do not want just encode <, ", etc. but whole text.

So the text of form

"<abc at def.gh>"

be encoded as

"&#x3c;&#x61;&#x62;&#x63;&#x20;&#x61;&#x74;&#x20;&#x64;&#x65;&#x66;&#x2e;&#x67;&#x68;&#x3e;"

My goal is compatibility to how CRAN encodes maintainers email addresses. So the < should not be a &lt; but it should be &#x3c;. Similarly . should not be &period; but it should be &#x2e;.

To see it on CRAN you can visit CRAN page of any package, i.e. https://cran.r-project.org/package=curl, then "view source" and find Maintainer field there.

I am looking for a lightweight solution that will require as few dependencies as possible, it doesn't have to be fast.

For reference, an online tool to decode encoded string: https://onlineasciitools.com/convert-html-entities-to-ascii

1

There are 1 best solutions below

0
On BEST ANSWER

Here is something quick (not thoroughly tested). It was inspired by another SO answer.

foo <- function(x) {
  splitted <- strsplit(x, "")[[1]]
  intvalues <- as.hexmode(utf8ToInt(enc2utf8(x)))
  paste(paste0("&#x", intvalues, ";"), collapse = "")
}

all.equal(
  foo("<abc at def.gh>"),
  "&#x3c;&#x61;&#x62;&#x63;&#x20;&#x61;&#x74;&#x20;&#x64;&#x65;&#x66;&#x2e;&#x67;&#x68;&#x3e;"
)
# [1] TRUE