Using a Java BreakIterator, I am able to extract words from a string. However, given the following string that uses parenthesis to indicate that a word could be plural, the parentheses are recognized as their own word.
String test = "Please enter the number of dependent(s).";
BreakIterator iterator = BreakIterator.getWordInstance(Locale.US);
iterator.setText(test);
int start = iterator.first();
for (int end = iterator.next(); end != BreakIterator.DONE; start = end, end = iterator.next()) {
System.out.println(test.substring(start, end));
}
Outputs:
Please
enter
the
number
of
dependent
(
s
)
.
When I would expect:
Please
enter
the
number
of
dependent(s)
.
Is it possible to use a custom implementation of a break iterator so that a word with an "optional plural" is in fact treated as one word?