How to read and process big csv file fast and keep memory usage low in java?

45 Views Asked by At

Use case Huge CSV file ~500MB that needs to be read very fast and to not load the meemory with.

Ideea that I used.

Read the csv line by line and save the transformed data directly into the database. ( At a later stage I am getting the data from the database and send it to another service but is not relevant for now)

 public void importData() {
    try (
      Reader reader = reader.readData();
      BufferedReader bufferedReader = new BufferedReader(reader);
    ) {
      String line;
      DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd");
      bufferedReader.readLine();

      while ((line = bufferedReader.readLine()) != null) {
        String[] parts = line.split(","); 
        LocalDate date= !parts[2].isEmpty() ? LocalDate.parse(parts[2], formatter) : null;
        String partThree = parts[3];

          String partZero= parts[0];
          String partOne= parts[1];
          String partFour= parts[4];
          String partFive= parts.length >= 6 ? parts[5] : null;

          service.saveDog(DogEntry.builder()
            .breed(partZero)
            .originSystem(partOne)
            .date(date)
            .state(partThree )
            .center(partFour)
            .partFive(partFive)
            .build());
      }
    } catch (IOException e) {
      throw new DOGException(ErrorCodes.CODES, "Cannot read Dog data", e);
    }
  }

Service method

public void saveDog(DogEntry entry) {

    LOGGER.info("Receiving Dog {}",
      entry.getBreed());

    final Dog dog = updateOrCreateDog(entry);

    dogRepository.save(dog);
  }

  private Dog updateOrCreateDog(final DogEntry entry) {
    Optional<Dog> existingDog = dogRepository.findByBreedAndOrigin(entry.getBreed(), entry.getOrigin());
    return existingDog.map(dog -> getUpdatedDog(dog, entry)).orElseGet(() -> createNewDog(entry));
  }

  private Dog getUpdatedDog(Dog existingDog, DogEntry entry) {
    existingDog.setBreed(entry.getBreed());
    existingDog.setOrigin(entry.getOriginSystem());
    existingDog.setStatus(entry.getState());
    existingDog.setCenter(entry.getCenter());
    return existingDog;
  }

  private Dog createNewDog(final DogEntry entry) {
    return Dog.builder()
      .breed(entry.getBreed())
      .origin(entry.getOriginSystem())
      .status(entry.getState())
      .center(entry.getCenter())
      .build();
  }

The problem is that I can not get all the info from csv and store it in a list because will cause OOM.

Is there a faster way so I am not getting a timeout trying to read and process the csv file ?

1

There are 1 best solutions below

2
W-S On

You can try to use the BufferedReader.lines() method. It looks something like the code below. This method reads file line-by-line which make it memory efficient.

import java.io.BufferedReader;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class Test{
    public static void main(String[] args) {
        Path path = Paths.get("customers-2000000.csv");
        try (BufferedReader reader = Files.newBufferedReader(path)) {
            reader.lines()
                    .forEach(Test::createObject);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    private static void createObject(String s) {
        System.out.println("s = " + s);
    }
}