How to handle more Files Handling in Java 8 in parallel

1.3k Views Asked by At

In my java web application is a file based integration. They used to send the bunch of xml files (example: 10000) in our production server opt/app/proceed/ folder. But as per the current configuration our application able to handle 200 files in a sequential processing. Due to this, delay in the processing of files. I am trying to increase the number of files processing in parallel way. Please find the block of code for your reference.

public class FileEx {

   public static void main(String[] args) throws IOException {
       String fileDir = "C:\\Users\\inputfiles"; //contains more than 10000 files
       new FileEx().traverseFilesFromDir(new File(fileDir));
   }

   public void traverseFilesFromDir(File dir) throws IOException {
       List<File> files = new ArrayList<File>();
       if (dir == null || !dir.isDirectory()) {
           throw new IllegalArgumentException("Not a valid directory (value: " + dir + ").");
       }
       File[] acknFiles = dir.listFiles();
       int fileCount = (acknFiles == null ? 0 : acknFiles.length);

       System.out.println("fileCount:::::::::" + fileCount);

       Arrays.sort(acknFiles, new Comparator<File>() {
           public int compare(File f1, File f2) {
               return Long.valueOf(f1.lastModified()).compareTo(f2.lastModified());
           }
       });

       **int maxNoFiles = acknFiles.length <= 500 ? acknFiles.length : 500;**
       System.out.println(acknFiles.length + " Ackn found and starting to process oldest " + maxNoFiles + " files.");

       for (int i = 0; i < maxNoFiles; i++) {
           files.add(acknFiles[i]);
       }

       int fileCount1 = (files == null ? 0 : files.size());

       if (fileCount1 > 0) {
           for (int i = 0; i < fileCount1; i++) {

               boolean success = true;// processFile(files.get(i));
               if (success) {
                   System.out.println("File Successfully processed.");
               }
           }
       }
   }
}

How to proceed to change the way of file processing. Awaiting support/guidance needed.

2

There are 2 best solutions below

0
On

// java 8.1 onwards Paraller stream you can use // here paralell() by default is executed using ForkJoinPool.commonPool()

Files.lines(Paths.get( files.get(i))
  .parallel()
  .map(Your_FileBean::new) 
  .forEach(/*process Your_FileBean*/);
0
On

You can do this via Files.walk(), which returns a Stream<Path> object. If you want to process this stream in parallel you can experiment with parallel() or collect(Collectors.toList()).parallelStream(). Because Files.walk() evaluates lazily, parallel() alone may not make efficient use of all available cores.

Onto that you can apply sorting and filtering as needed. Your processing step could be realised through a forEachOrdered() at the end of the stream.

Here's an example:

Files.walk(Paths.get("/path/to/root/directory")) // create a stream of paths
    .collect(Collectors.toList()) // collect paths into list to better parallize
    .parallelStream() // process this stream in multiple threads
    .filter(Files::isRegularFile) // filter out any non-files (such as directories)
    .map(Path::toFile) // convert Path to File object
    .sorted((a, b) -> Long.compare(a.lastModified(), b.lastModified())) // sort files date
    .limit(500) // limit processing to 500 files (optional)
    .forEachOrdered(f -> {
        // do processing here
        System.out.println(f);
    });