I have a String of mixed data, some words and numbers. The numbers are either whole integers, ratios of integers, or a percent sign in front of a whole integer. I'm trying to store this information in a Map (could be another type of object, if that makes sense) for the duration of the run of the program (not to a database). Leaving aside the percent sign, the rest of the data is parsed ok. I can always expect the data to be in this exact form of variables with colons.
Correct output (tabs gives funny indent):
AB: 272/272 CD: 204/529 EFGH: 105 HIJKL: 105 MN: 0 OPQ: 0%
AB 272/272
HIJKL 105
CD 204/529
MN 0
EFGH 105
OPQ 0%
-----------
AB 272/272
CD 204/529
HIJKL 105/1
MN 0/1
EFGH 105/1
OPQ 0/1
The first print is with Map<String,String>
, the second is with Map<String,Ratio>
. If there's a better choice than my homemade ratio, I'll gladly use that.
clumsy code, yes, overusing static, just meant to be easy to copy/paste:
package regex;
import java.util.HashMap;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import static java.lang.System.out;
class Ratio {
private int numerator;
private int denominator;
private Ratio() {
}
public Ratio(int numerator, int denominator) {
this.numerator = numerator;
this.denominator = denominator;
}
public int getNumerator() {
return numerator;
}
public int getDenominator() {
return denominator;
}
public String toString() {
return numerator + "/" + denominator;
}
}
public class Ratios {
private static String line = "AB: 272/272 CD: 204/529 EFGH: 105 HIJKL: 105 MN: 0 OPQ: 0%";
private static Map<String, String> rawMapStringToString = new HashMap<>();
private static Map<String, Ratio> mapStringToRatio = new HashMap<>();
public static void main(String[] args) {
out.println(line);
populateMap();
printMap(rawMapStringToString);
out.println("-----------");
ratios();
printMap(mapStringToRatio);
}
private static void populateMap() {
Pattern pattern = Pattern.compile("(\\w+): +(\\S+)");
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
rawMapStringToString.put(matcher.group(1), matcher.group(2));
}
}
private static void printMap(Map<?, ?> m) {
for (Map.Entry<?, ?> e : m.entrySet()) {
String key = e.getKey().toString();
String val = e.getValue().toString();
out.println(key + "\t\t" + val);
}
}
private static void ratios() {
Pattern pattern = Pattern.compile("(\\d+)/(\\d+)");
Pattern p2 = Pattern.compile("(\\w+)");
Matcher m2;
int num, den;
Ratio ratio = null;
for (Map.Entry<String, String> e : rawMapStringToString.entrySet()) {
ratio = null;
num = 0;
den = 1;
Matcher matcher = pattern.matcher(e.getValue());
while (matcher.find()) {
num = Integer.parseInt(matcher.group(1));
den = Integer.parseInt(matcher.group(2));
ratio = new Ratio(num, den);
}
if (ratio == null) {
m2 = p2.matcher(e.getValue());
while (m2.find()) {
num = Integer.parseInt(m2.group());
den = 1;
ratio = new Ratio(num, den);
}
}
mapStringToRatio.put(e.getKey(), ratio);
}
}
}
I'm just looking for a good way to store this data. Of course, the percent can be represented as a ratio, x/y, just change the denominator to 100. Leaving that aside for now, is a Map a good choice?
The ratios
method, and overall regex, seems fragile, awkward, and difficult (for me) to follow, and yet I'm unsure how to improve the code. Keeping the Ratio
class pretty much untouched, how can I improve the ratios
method, which populates the mapStringToRatio
?
What you're going to do with the data is very important to help decide what kind of data structure to store it into. If you're just printing them, storing them would be a waste of time. But I'm pretty sure you're not just printing this data right?
A Map is fine as long as your keys don't repeat. Otherwise you'll be replacing existing values with new ones that have the same key. If you don't think that's a problem, than you can keep the Map.
One other possible solution is to store the key inside the Ratio itself. So your Ratio object would have a "name" member and then you can store your data in a List of Ratios.
I liked your Ratio object and I think there's not much more to add (or remove) from it. I do agree that Regexp are complex and hard to read and understand what the code is doing. But I also think that the solution you gave is nice and clean. To make the code simpler and more readable you could use a Pattern with named groups and put everything in one pattern only. I wrote the following code:
If a group doesn't exist it will return an empty String. That way you can test it using isEmpty:
One thing I would do is to take this logic into a separate class that would be easier to test. Have everything as static variables running from a main method is not recommended.
If you're looking for a different solution than Regexp you could use a StringTokenizer to separate them using the space/tab. And then break the string using split for a colon. Then check for % or / in the right string and process them differently.
Something like:
The disadvantage of this code is that if you add new types for values you'll end up with long if/else chain. It is also harder to test since you'll have many different branches in it. If you don't plan to add new value types then that would be fine.
If you plan to expand this a lot I would go with a more abstract approach, creating a RatioProcessor interface and different implementations for it like PercentageRatioProcessor and DivisionRatioProcessor. This interface would have a "canProcess" method and a "process" method that would return a boolean and a Ratio respectively. The boolean indicating if that's the correct processor to use and the object is the processed Ratio.