I have a string which is a result fetched from a website of parsing a tweet content, here is the string:
"1\tI\t_\tPRP\tPRP\t_\t2\tnsubj\t_\t_\n2\tneed\t_\tVB\tVBP\t_\t0\tnull\t_\t_\n3\tmore\t_\tJJ\tJJR\t_\t4\tamod\t_\t_\n4\twords\t_\tNN\tNNS\t_\t2\tdobj\t_\t_\n5\tlike\t_\tIN\tIN\t_\t4\tprep\t_\t_\n6\tmarvel\t_\tNN\tNN\t_\t5\tpobj\t_\t_\n7\tor\t_\tCC\tCC\t_\t6\tcc\t_\t_\n8\tcat\t_\tNN\tNN\t_\t6\tconj\t_\t_\n9\tor\t_\tCC\tCC\t_\t6\tcc\t_\t_\n10\tpancake\t_\tNN\tNN\t_\t6\tconj\t_\t_\n11\tor\t_\tCC\tCC\t_\t10\tcc\t_\t_\n12\tfrance\t_\tNN\tNN\t_\t10\tconj\t_\t_", "text": "I need more words like marvel or cat or pancake or france"
I want to get all the words who are between "\t" and "\t_\tNN", in other words I want the nouns, I wanted the output to be "words", "marvel", "cat", "pancake", "france".
I tried the code below:
private void regex(String s){
if(s.indexOf("error") >= 1){
Toast.makeText(this, "Sorry the site failed again it's not my fault :(",
Toast.LENGTH_SHORT).show();
}
else{
Pattern pattern = Pattern.compile("\t(.*?)\t_\tNN");
Matcher matcher = pattern.matcher(s);
System.out.println(s);
if (matcher.find()) {
String result = matcher.group(1);
System.out.println(result);
}
}
}
I am sure I got the pattern.compile string wrong.. it's not working seems it can't find the words I wanted..
Could anybody tell me how should I fix it?
P.S. About the tab character lookalike "/t", I actually printed the whole website as result, but when I get the result as a string I guess they become just a backslash and a "t" instead of still being tab characters.
You can use the following:
See Ideone Demo
See RegEx Demo