I have a Quarkus based REST API project in which one endpoint is supposed to serve exported data as .csv files. Since i do not want to create temporary files, i was writing to a ByteArrayInputStream to be used in an octet stream response for my webservice. However, although this works fine for latin character content we also have content that may be in Chinese. The downloaded .csv file does not view the characters properly or rather does not write them properly (they only show up as question marks, even in plain text view e.g. with notepad). We already checked the source of the problem not being how the data is stored, for example the encoding in the database is correct and it works fine when we export it as .json (here we can set charset utf-8).
As far as i understand a charset or encoding cannot be set for an octet stream. So how can we export/stream this content as a file download without creating an actual file?
Some code examples below on how we do it currently. We use the apache common library component CSVPrinter
to create the CSV format in text in a custom CSV streamer class:
@ApplicationScoped
public class JobRunDataCsvStreamer implements DataFormatStreamer<JobData> {
@Override
public ByteArrayInputStream streamDataToFormat(List<JobData> dataList) {
try {
ByteArrayOutputStream out = getCsvOutputStreamFor(dataList);
return new ByteArrayInputStream(out.toByteArray());
} catch (IOException e) {
throw new RuntimeException("Failed to convert job data: " + e.getMessage());
}
}
private ByteArrayOutputStream getCsvOutputStreamFor(List<JobData> dataList) throws IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
CSVPrinter csvPrinter = new CSVPrinter(new PrintWriter(out), getHeaderFormat());
for (JobData jobData : dataList) {
csvPrinter.printRecord(extractStringRowData(jobData));
}
csvPrinter.flush();
csvPrinter.close();
return out;
}
private CSVFormat getHeaderFormat() {
return CSVFormat.EXCEL
.builder()
.setDelimiter(";")
.setHeader("ID", "Source term", "Target term")
.build();
}
private List<String> extractStringRowData(JobData jobData) {
return Arrays.asList(
String.valueOf(jobData.getId()),
jobData.getSourceTerm(),
jobData.getTargetTerm()
);
}
}
Here is the quarkus API endpoint for the download:
@Path("/jobs/data")
public class JobDataResource {
@Inject JobDataRepository jobDataRepository;
@Inject JobDataCsvStreamer jobDataCsvStreamer;
...
@GET
@Path("/export/csv")
@Produces(MediaType.APPLICATION_OCTET_STREAM)
public Response getAllAsCsvExport() {
List<JobData> jobData = jobDataRepository.getAll();
ByteArrayInputStream stream = jobDataCsvStreamer.streamDataToFormat(jobData);
return Response.ok(stream, MediaType.APPLICATION_OCTET_STREAM)
.header("content-disposition", "attachment; filename = job-data.csv")
.build();
}
}
Screenshot of result in the downloaded file for chinese characters in the second column:
We tried setting headers etc. for encoding, but none of it worked. Is there a way to stream content which requires specific encoding as a file in Java web services? We tried using PrintWriter
which works, but requies creating a local file on the server.
Edit: We tried using PrintWriter(out, false, StandardCharsets.UTF_8)
for the PrintWriter to write to a byte array out stream for the response, which yields a different result but still with broken view in both Excel and plain text:
Screenshot:
Code for endpoint:
@GET
@Path("/export/csv")
@Produces(MediaType.APPLICATION_OCTET_STREAM)
public Response getAllAsCsvExport() {
List<JobData> jobData = jobRunDataRepository.getAll();
ByteArrayOutputStream out = new ByteArrayOutputStream();
try{
PrintWriter pw = new PrintWriter(out, false, StandardCharsets.UTF_8);
pw.println(String.format("%s, %s, %s", "ID", "Source", "Target"));
for (JobData item : jobData) {
pw.println(String.format("%s, %s, %s",
String.valueOf(item.getId()),
String.valueOf(item.getSourceTerm()),
String.valueOf(item.getTargetTerm()))
);
}
pw.flush();
pw.close();
} catch (Exception e) {
throw new RuntimeException("Failed to convert job data: " + e.getMessage());
}
return Response.ok(out).build();
}