Friday, September 28, 2018

Java - counting specific words

https://codereview.stackexchange.com/questions/73486/counting-five-specific-words-in-a-file

6 down vote accepted
Here's a better, more efficient and compact way:
    String path = "C:/Users/n/Desktop/Text.txt";

    List<String> targetList = Arrays.asList("a", "the", "bird", "animal", "is");
    Map<String, Integer> counts = new HashMap<>(targetList.size());
    for (String word : targetList) {
        counts.put(word, 0);
    }

    for (String line : Files.readAllLines(Paths.get(path))) {
        for (String word : line.replaceAll("[!?.,]", "").toLowerCase().split("\\s+")) {
            Integer count = counts.get(word);
            if (count != null) {
                counts.put(word, count + 1);
            }
        }
    }

    System.out.print(counts.get(targetList.get(0)));
    for (int i = 1; i < targetList.size(); ++i) {
        String word = targetList.get(i);
        System.out.print(" " + counts.get(word));
    }
    System.out.println();
The improvements and corrections:
  • It's good to define constants like the path high up in a file where they are easy to change and easy to change, without having to read into the details of the code
  • It's simpler to write paths with forward slashes
  • Use interface type like List when defining a list instead of implementation type like ArrayList
  • Since it seems you're only interested in a specific set of words:
    • I put them in a list for ordering
    • ... then initialized the map of counts to all 0 values
  • Instead of building a list of words, it's more efficient to do the counting at the same time as you read the words. This will save you both storage and processing time
  • When you do line.replaceAll("[!?.,]", ""), the operation is not performed on line, as strings in Java are immutable. The result with the characters removed is returned
  • The same goes for a line.split("\\s+") statement you had. If you don't save the result of the operation, then it's completely pointless