I am trying to use the German stemmer that comes with RTextTools but the results I get are quite off the mark.
Say, I have the following vector:
v <- c("groß", "größer", "am", "größten", "ähnlicher")
Using
library(RTextTools)
wordStem(v, "german")
I get
[1] "groß" "größer" "am" "größten" "ähnlich"
What am I missing??
The algorithm in Snowball
looks like it is translated back to 'DF' "ß"
Representation of umlaut by following e The German letters ä, ö and ü, are occasionally represented by ae, oe and ue respectively. The stemmer here is a variant of the main German stemmer to take this into account.
The main German stemmer begins with the rule,
This is replaced with the rule,