TreeTagger in R

2.9k Views Asked by At

I have downloaded TreeTaggerv3.2 for Windows and have configured it per the install.txt. I am trying to use it in R with koRpus package. I have set the kRp.env as -

set.kRp.env(TT.cmd="C:\\TreeTagger\\bin\\tag-english.bat", lang="en", 
   preset="en", treetagger="manual", format="file", 
    TT.tknz=TRUE, encoding="UTF-8" )

.My data to be tagged is in a file and trying to use it as treetag("myfile.txt") but it is throwing the error-

Error in matrix(unlist(strsplit(tagged.text, "\t")), ncol = 3, byrow = TRUE, : 'data' must be of a vector type, was 'NULL'

In addition: Warning message: running command 'C:\windows\system32\cmd.exe /c C:\TreeTagger\bin\tag-english.bat

C:\Users\vivsingh\Desktop\NLP\tree_tag_ex.txt' had status 255

The standalone TreeTagger is working on by windows.Any idea on how it works?

4

There are 4 best solutions below

0
On

You can face the same error while setting up the korpus environment and getting the result from treetagger. For example, when you use:

tagged.text <- treetag(
  "C:/temp/sample_text.txt",
  treetagger = "manual",
  lang = "en",
  TT.options = list(
    path = "c:/Treetagger",
    preset = "en"
  ),
  doc_id = "sample"
)

You would receive a similar error

Error: Awww, this should not happen: TreeTagger didn't return any useful data.

This can happen if the local TreeTagger setup is incomplete or different from what presets expected.
You should re-run your command with the option 'debug=TRUE'. That will print all relevant configuration.
Look for a line starting with 'sys.tt.call:' and try to execute the full command following it in a command line terminal. Do not close this R session in the meantime, as 'debug=TRUE' will keep temporary files that might be needed.
If running the command after 'sys.tt.call:' does fail, you'll need to fix the TreeTagger setup.
If it does not fail but produce a table with proper results, please contact the author!

Here you need to change the value of treetagger, from

treetagger = "manual"

to

treetagger = "kRp.env"

However, before that remember to set the kRp.env as @Xochitl C. suggested in their answer

set.kRp.env(TT.cmd="C:\\TreeTagger\\bin\\tag-english.bat", lang="en", preset="en", treetagger="manual", format="file", TT.tknz=TRUE, encoding="UTF-8")

Once you do this, you'll get the desired result.

0
On

I had the exact same error and warning while trying lemmatization on R word vector following Bernhard Learns blog using windows 7 and R 3.4.1 (x64). The issue was also appearing using textstem package but TreeTagger was running properly in cmd window.

I mixed several answers I found on this post and here is my steps and code running properly:

get into R win_library (~\Documents\R\win-library\3.4\rJava\jri\x64\jri.dll) and copy jri.dll (thanks kravi!) to replace it the parent folder.

close and restart R

library(koRpus)

set.kRp.env(TT.cmd="C:\\TreeTagger\\bin\\tag-english.bat", lang="en", preset="en", treetagger="manual", format="file", TT.tknz=TRUE, encoding="UTF-8")
lemma_tagged <- treetag(lemma_unique$word_clean, treetagger="manual", format="obj", TT.tknz=FALSE , lang="en", TT.options=list(path="c:/TreeTagger", preset="en"))
lemma_tagged_tbl <- tbl_df([email protected])

Hope it helps.

0
On

I am posting this answer to keep a record. I also faced the same issue due to incorrect specification of the location of jri.dll on 64-Bit processor and windows 8.1. If we call set.kRp.env(TT.cmd="manual", lang="en", TT.options=list(path="/path/to/tree-tagger-windows-x.x/TreeTagger", preset="en")) and we follow either of following two steps, we can resolve this error:

  1. While installing R, if we install only 64 Bit version of R, and specify the proper path for these variables

    LD_LIBRARY_PATH = /path/to/rJava/jri
    JAVA_HOME = /path/to/jdk1.x.x
    java.library.path = /path/to/rJava/jri/jri.dll
    CLASSPATH = /path/to/rJava/jri

  2. If we already installed both versions viz. 32 bit and 64 bit of R on your computer then just copy jri.dll from /path/to/rJava/jri/x64/jri.dll and replace at path/to/rJava/jri/jri.dll. Further, we need to set the path of above mentioned four variables.

0
On

I've got this issue (very similar I guess) and posted query to GitHub. https://github.com/unDocUMeantIt/koRpus/issues/7 The current working solution for me for this case was easier than I could expect, just downgrading the koRpus package. This can change with time but this version should remain appropriate.

library("devtools")
install_github("unDocUMeantIt/koRpus", ref="0.06-5")

This package is not Java related they said.