I have a requirement to write a huge data (I can get data up to 1KB in one second) to a number of different files (each file can raise up to few GiBs of space.
In my application, one thread constantly produces the data and currently I am creating one thread per file to write the data (creating a new file depends on certain criteria on the input data that the producer is producing).
I am using FileWriter wrapped by a BufferedWriter. Initially I tried writing with the default buffered size of 8KB. But since that makes the number of writes per second higher, the CPU consumption increases rapidly without coming down.
So, I have now increased the buffered size to 50KB. But this is making my application crash due to outOfMemory problem.
When I profiled it, I could see that all the data is being stored in the form of char arrays which is created by the buffered writer (remember one buffered writer per file. And I have around 400 files like that).
Please suggest how to overcome the problem here. I would also like to know if there are any alternatives or better ways to implement the requirement.
I cannot decrease the buffer size to less than 50KB as this is make my CPU go up to 100%.
Edit: Okay, I thought I need not add code as I could make the requirement very clear (and moreover I have nothing to do with the code. It was alright, just that I was looking for efficiency). But since, my question was downvoted, I am including my code here.
public class DataWriter {
private LinkedBlockingQueue<MyDataObject> dataQueue = new LinkedBlockingQueue<>();
private ExecutorService singleThread = Executors.newSingleThreadExecutor();
private boolean isRunning = true;
private Map<String, FileWriterThread> map = Collections.synchronizedMap(new HashMap<String, FileWriterThread>());
public DataWriter() {
singleThread.submit(new DataProcessor());
}
public void writeProducedData(MyDataObject object) {
if (isRunning) {
dataQueue.offer(object);
}
}
public void stopWriting() {
isRunning = false;
}
private class DataProcessor implements Runnable {
@Override
public void run() {
while (isRunning) {
MyDataObject obj = dataQueue.take();
if (obj.getMapKey() == null) {
FileWriterThread thread = new FileWriterThread();
map.put(obj.getMapKey(), thread);
}
FileWriterThread thread = map.get(obj.getMapKey());
thread.writeData(obj);
}
}
}
}
FileWriterThread class:
public class FileWriterThread {
private ExecutorService singleThread = Executors.newSingleThreadExecutor();
private FileWriter fileWriter;
private BufferedWriter bufferedWriter;
private LinkedBlockingQueue<MyDataObject> dataQueue = new LinkedBlockingQueue<>();
public FileWriterThread() {
singleThread.submit(new DataProcessor());
}
public void writeData(MyDataObject obj) {
if (fileWriter == null) {
createWriter(obj.getFileName());
}
dataQueue.offer(obj);
}
public void stopWriting() {
// close the file writer and buffer writer gracefully
}
private void createWriter(String fileName) {
try {
fileWriter = new FileWriter(fileName, true);
bufferedWriter = new BufferedWriter(fileWriter, 50);
} catch (Exception e){}
}
private class DataProcessor implements Runnable {
@Override
public void run() {
MyDataObject obj = dataQueue.take();
try {
bufferedWriter.write(obj.toString());
} catch(Exception e) {}
}
}
}
Having all these threads buys you nothing -- the whole thing is IO-bound, not CPU-bound (or would be, if your CPU wasn't struggling to juggle 400 threads).
In a single thread, just have your producer write to the appropriate writer:
If you want to separate your concerns, maybe having a few (not hundreds) threads for writing and one for producing, separate them with a
Queue
.