I have a hive udf for cleaning text, the text is encoded with "quoted-printable", this is the code I refer from online:
InputStream is = new ByteArrayInputStream(text.getBytes());
try {
InputStream isAfterDecode = MimeUtility.decode(is, "quoted-printable");
text = new BufferedReader(
new InputStreamReader(isAfterDecode, StandardCharsets.UTF_8))
.lines()
.collect(Collectors.joining(System.lineSeparator()));
} catch (MessagingException e) {
throw new RuntimeException(e);
}
When I test it on my IDEA, it worked fine. However, when I packaged it as a Hive function jar and uploaded to Cloud(Dataproc) to use it by SparkSql, it can't work. Do I need to do anything else?
I though it may be caused by the environment?