How do I get scan summary results of ClamAV without string manipulation?

60 Views Asked by At

I have this method where I use ClamAV to scan a folder containing files:

public void scanDirectory(String directoryPath) {
    List<String> summary = new ArrayList<>();
    int exitCode = -1;
    Path path = Paths.get(directoryPath);
    int infectedFiles = 0;
    int encryptedFiles = 0;

    // Validate the directory path
    if (!path.toFile().exists()) {
        throw new IllegalArgumentException("Invalid directory path: " + directoryPath);
    }

    try {
        ProcessBuilder processBuilder = new ProcessBuilder("clamscan", "-r", "--alert-encrypted=yes", directoryPath);
        Process process = processBuilder.start();

        // Read the output of the scan
        var stdOutput = new BufferedReader(new InputStreamReader(process.getInputStream()));

        String line;
        boolean summaryStarted = false;
        while ((line = stdOutput.readLine()) != null) {
            log.info(line);

            // store the scan summary
            if (summaryStarted) {
                summary.add(line);
            } else if (line.contains("----------- SCAN SUMMARY -----------")) {
                summaryStarted = true;
            }

            // Check for phrases indicating the file status
            if (line.contains("Heuristics.Encrypted")) {
                encryptedFiles++;
            } else if (line.contains("OK")) {
                // File is clean
            } else if (line.contains("FOUND")) {
                infectedFiles++;
            }
        }

        // Wait for the process to finish and check the exit code
        exitCode = process.waitFor();
        log.info("exitCode: " + exitCode);
    } catch (IOException | InterruptedException e) {
        log.error("Exception occurred", e);
    }

    switch (exitCode) {
        case 0:
            // NO VIRUS FOUND
            return;
        case 1:
            // DELETE THE DIRECTORY
            // RETURN ERROR
            FileUtils.deleteQuietly(path.toFile());
            if (infectedFiles == 0 || encryptedFiles > 0) {
                log.warn(encryptedFiles + " encrypted files detected!");
                throw new ResponseStatusException(HttpStatus.PRECONDITION_FAILED, encryptedFiles + " encrypted files detected!");
            } else {
                log.warn("Virus detected!" + summary);
                throw new ResponseStatusException(HttpStatus.UNPROCESSABLE_ENTITY, "Virus detected in " + infectedFiles + " files!");
            }
        default:
            if (buildInfo.isDevelopment()) return;
            throw new ResponseStatusException(HttpStatus.INTERNAL_SERVER_ERROR, "Security scan encountered an issue. Please try again later.");
    }
}

So as you can see, in order for me to get the number of infected and encrypted files I have to do this string manipulations which I'm not a very big fan of, because if ClamAV makes changes to the way they log the result, it will break the code. I have a hard time believing that there are no method to simply get the numbers at least that comes out in the summary:

2024-01-22 09:18:03 UTC | INFO  | ClamAVScanner | ----------- SCAN SUMMARY -----------
2024-01-22T09:18:03.240870600Z 2024-01-22 09:18:03 UTC | INFO  | ClamAVScanner | Known viruses: 8682727
2024-01-22T09:18:03.240872900Z 2024-01-22 09:18:03 UTC | INFO  | ClamAVScanner | Engine version: 0.103.11
2024-01-22T09:18:03.240874600Z 2024-01-22 09:18:03 UTC | INFO  | ClamAVScanner | Scanned directories: 6
2024-01-22T09:18:03.241020900Z 2024-01-22 09:18:03 UTC | INFO  | ClamAVScanner | Scanned files: 10
2024-01-22T09:18:03.241048500Z 2024-01-22 09:18:03 UTC | INFO  | ClamAVScanner | Infected files: 3
2024-01-22T09:18:03.241056400Z 2024-01-22 09:18:03 UTC | INFO  | ClamAVScanner | Data scanned: 0.05 MB
2024-01-22T09:18:03.241118000Z 2024-01-22 09:18:03 UTC | INFO  | ClamAVScanner | Data read: 128.04 MB (ratio 0.00:1)
2024-01-22T09:18:03.241161300Z 2024-01-22 09:18:03 UTC | INFO  | ClamAVScanner | Time: 15.800 sec (0 m 15 s)
2024-01-22T09:18:03.241343100Z 2024-01-22 09:18:03 UTC | INFO  | ClamAVScanner | Start Date: 2024:01:22 09:17:47
2024-01-22T09:18:03.241371900Z 2024-01-22 09:18:03 UTC | INFO  | ClamAVScanner | End Date:   2024:01:22 09:18:03
2024-01-22T09:18:03.242732100Z 2024-01-22 09:18:03 UTC | INFO  | ClamAVScanner | exitCode: 1
2024-01-22T09:18:03.295782100Z 2024-01-22 09:18:03 UTC | WARN  | ClamAVScanner | 2 encrypted files detected!

Anyone who knows how I can get the Infected files count, Scanned files count, etc. without having to do string manipulation?

0

There are 0 best solutions below