TikzDevice does not output code with umlauts in UTF-8 under Windows

496 Views Asked by At

tikzDevice does not output code with Umlauts under Windows in UTF-8

I write a report with RMarkdown and use tikzDevice for plotting. When I use German Umlauts (äöüÖÄÜ), RStudio throws the following error:

pandoc.exe: Cannot decode byte '\xd6': Data.Text.Internal.Encoding.streamDecodeUtf8With: Invalid UTF-8 stream

Here is a minimal example:

---
title: "test"
author: "test"
date: "Today"
output: 
  pdf_document: 
    keep_tex: true
header-includes:
   - \usepackage{tikz}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(tikzDevice)
options(tikzDefaultEngine = "xetex")
```
```{r plot, dev="tikz", external=FALSE}
x <- rnorm(50)
y <- rnorm(50)

plot(x, y, xlab = "ÖÄÜ", ylab = "öäü")
```

With this code, tikzDevice writes the TeX file (plot) with an 1252 encoding, which does not work when included into the main LaTeX document. Therefore Pandoc throws an error. I tried it under Ubuntu and the code works. I suspect, that the Windows encoding is the reason for this problem, but I cannot figure out a solution.

The source file (Rmd) is in the UTF-8 encoding. The generated TeX file (by tikzDevice) is NOT in the UTF-8 encoding.

SessionInfo (Windows):

version  R version 3.6.1 (2019-07-05)
os       Windows 10 x64
system   x86_64, mingw32
ui       RStudio
language (EN)
collate  German_Germany.1252
ctype    German_Germany.1252
tz       Europe/Berlin
date     2019-09-04 

SessionInfo (Ubuntu):

version  R version 3.4.4 (2018-03-15)
os       Ubuntu 18.04.3 LTS
system   x86_64, linux-gnu
ui       X11
language (EN)
collate  C.UTF-8
ctype    C.UTF-8
tz       Europe/Berlin
date     2019-09-04
3

There are 3 best solutions below

0
On

I can reproduce the behavior. Please open as issue at https://github.com/daqana/tikzDevice/issues. As a workaround you can use

---
title: "test"
author: "test"
date: "Today"
output: 
  pdf_document: 
    keep_tex: true
header-includes:
   - \usepackage{tikz}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(tikzDevice)
options(tikzDefaultEngine = "xetex")
```

```{r plot, dev="tikz", external=FALSE}
x <- rnorm(50)
y <- rnorm(50)

plot(x, y, xlab = '\\"O\\"A\\"U', ylab = '\\"o\\"a\\"u')
```
0
On

In R or Python, while reading CSV or Text file use (r'') example r'c:\hem\dow\train.csv' we have to declare r'' for reading a file .

0
On

Another workaround is to convert all tikz/tex files in the figures folder. Using iconv the file content will be converted from CP1252 to UTF-8. If this is the last chunk in the document, you don't need to "hardcode" Umlauts:

# path of the Rmd file
path <- getwd()
# subfolder of the cache and figures
subfolder <- paste(gsub(knitr::current_input(), pattern = ".Rmd", replacement = ""), "_files", sep = "")
# beamer or latex figures
figures <- ifelse(dir.exists(paste(path, subfolder, "figure-latex", sep = "/")), "figure-latex", ifelse(dir.exists(paste(path, subfolder, "figure-beamer", sep = "/")), "figure-beamer", ""))
# full path of the figure folder
folder <- paste(path, subfolder, figures, sep = "/")
# find all tex/tikz files in the figures folder
for (x in list.files(folder, pattern = "*.tex")) {
  # full path to file
  file <- paste(folder, "/", x, sep = "")
  # full path to temp file
  temp <- paste(folder, "/", "temp.tex", sep = "")
  # rename source file to temp
  file.rename(file, temp)
  # read input file in correct encoding
  input <- readLines(temp, encoding = "cp1252")
  # convert input to UTF-8
  output <- iconv(input, from = "cp1252", to = "UTF8")
  # write output with original filename
  writeLines(input, con = file(file, encoding = "UTF8"))
  # remove temp file
  file.remove(temp)
  rm(input, output)
}

Edit: Now also useable with beamer.