How to write a number of large files with memory efficiency in Java 7

610 Views Asked by At

I have a requirement to write a huge data (I can get data up to 1KB in one second) to a number of different files (each file can raise up to few GiBs of space.

In my application, one thread constantly produces the data and currently I am creating one thread per file to write the data (creating a new file depends on certain criteria on the input data that the producer is producing).

I am using FileWriter wrapped by a BufferedWriter. Initially I tried writing with the default buffered size of 8KB. But since that makes the number of writes per second higher, the CPU consumption increases rapidly without coming down.

So, I have now increased the buffered size to 50KB. But this is making my application crash due to outOfMemory problem.

When I profiled it, I could see that all the data is being stored in the form of char arrays which is created by the buffered writer (remember one buffered writer per file. And I have around 400 files like that).

Please suggest how to overcome the problem here. I would also like to know if there are any alternatives or better ways to implement the requirement.

I cannot decrease the buffer size to less than 50KB as this is make my CPU go up to 100%.

Edit: Okay, I thought I need not add code as I could make the requirement very clear (and moreover I have nothing to do with the code. It was alright, just that I was looking for efficiency). But since, my question was downvoted, I am including my code here.

public class DataWriter {

      private LinkedBlockingQueue<MyDataObject> dataQueue = new LinkedBlockingQueue<>();
      private ExecutorService singleThread = Executors.newSingleThreadExecutor();
      private boolean isRunning = true;
      private Map<String, FileWriterThread> map = Collections.synchronizedMap(new HashMap<String, FileWriterThread>());

      public DataWriter() {
         singleThread.submit(new DataProcessor());
      }

      public void writeProducedData(MyDataObject object) {
          if (isRunning) {
             dataQueue.offer(object);
          }
      }

      public void stopWriting() {
          isRunning = false;
      }

      private class DataProcessor implements Runnable {

         @Override
         public void run() {
            while (isRunning) {

            MyDataObject obj = dataQueue.take();

            if (obj.getMapKey() == null) {
                FileWriterThread thread = new FileWriterThread();
                map.put(obj.getMapKey(), thread);
            }

            FileWriterThread thread = map.get(obj.getMapKey());
            thread.writeData(obj);
         }
        }        
      }
    }

FileWriterThread class:

public class FileWriterThread {

       private ExecutorService singleThread = Executors.newSingleThreadExecutor();
       private FileWriter fileWriter;
       private BufferedWriter bufferedWriter;
       private LinkedBlockingQueue<MyDataObject> dataQueue = new LinkedBlockingQueue<>();

    public FileWriterThread() {
      singleThread.submit(new DataProcessor());
    }

    public void writeData(MyDataObject obj) {
     if (fileWriter == null) {
       createWriter(obj.getFileName());
     }
     dataQueue.offer(obj);
    }

    public void stopWriting() {
      // close the file writer and buffer writer gracefully
    }

    private void createWriter(String fileName) {
      try {
        fileWriter = new FileWriter(fileName, true);
        bufferedWriter = new BufferedWriter(fileWriter, 50);
      } catch (Exception e){}   
    }

    private class DataProcessor implements Runnable {

       @Override
       public void run() {
         MyDataObject obj =  dataQueue.take();
         try {
           bufferedWriter.write(obj.toString());
         } catch(Exception e) {}
       }
    }
  }
1

There are 1 best solutions below

2
On

Having all these threads buys you nothing -- the whole thing is IO-bound, not CPU-bound (or would be, if your CPU wasn't struggling to juggle 400 threads).

In a single thread, just have your producer write to the appropriate writer:

   Map<String, Writer> writers = ...;

   private handleOutput(byte[] output, String key) {
        writers.get(key).write(output);
   }

If you want to separate your concerns, maybe having a few (not hundreds) threads for writing and one for producing, separate them with a Queue.