How to count the number of tweets for each label in the dataset and then split data by labels (java)?

96 Views Asked by At

This is my code, and it splits data into sentences. Each sentence is labeled with an emotion. I need to count the number of sentences for each label (emotion) and split the dataset according to labels.

public class DataProcessor {

    public static void main(String[] args) throws FileNotFoundException {
        try (Scanner read = new Scanner (new File("E:\\blabla.txt"))) {
            read.useDelimiter("::");
            String tweet;
            while(read.hasNext())
            {
                tweet = read.next();            
                System.out.println(tweet + " "+ "\n"); //just for debugging
            }
        }
    }
}

output looks like this

joy: Had a test today. But I still was good

1

There are 1 best solutions below

0
On BEST ANSWER
public static void main(String[] args) throws FileNotFoundException {
    HashMap<String, List<String>> map = new HashMap<>();
    try (Scanner read = new Scanner (new File("E:\\blabla.txt"))) {
        read.useDelimiter("::");
        String tweet;
        while(read.hasNext())
        {
            tweet = read.next();
            String[] split = tweet.split(":");
            String key = split[0];
            if (!map.containsKey(key)) {
                map.put(key, new ArrayList<>());
            }
            map.get(key).add(split[1]);
        }
    }
}    

The map contains all emotions with sentences. To get the number of sentences, lets call them tweets because sometimes they contain more than one, you can use map.get("joy").size().

If the tweet also can contain : I would change tweet.split(":"); to tweet.split(":", 2); so that only the first delimiter is used to split.

To check the result map you can use this code:

map.forEach((e, t) -> {
    System.out.println(e);
    t.forEach(System.out::println);
} );