How to extract all words between certain special characters from a string which has no spaces?

410 Views Asked by At

I have a string which is a result fetched from a website of parsing a tweet content, here is the string:

"1\tI\t_\tPRP\tPRP\t_\t2\tnsubj\t_\t_\n2\tneed\t_\tVB\tVBP\t_\t0\tnull\t_\t_\n3\tmore\t_\tJJ\tJJR\t_\t4\tamod\t_\t_\n4\twords\t_\tNN\tNNS\t_\t2\tdobj\t_\t_\n5\tlike\t_\tIN\tIN\t_\t4\tprep\t_\t_\n6\tmarvel\t_\tNN\tNN\t_\t5\tpobj\t_\t_\n7\tor\t_\tCC\tCC\t_\t6\tcc\t_\t_\n8\tcat\t_\tNN\tNN\t_\t6\tconj\t_\t_\n9\tor\t_\tCC\tCC\t_\t6\tcc\t_\t_\n10\tpancake\t_\tNN\tNN\t_\t6\tconj\t_\t_\n11\tor\t_\tCC\tCC\t_\t10\tcc\t_\t_\n12\tfrance\t_\tNN\tNN\t_\t10\tconj\t_\t_", "text": "I need more words like marvel or cat or pancake or france"

I want to get all the words who are between "\t" and "\t_\tNN", in other words I want the nouns, I wanted the output to be "words", "marvel", "cat", "pancake", "france".

I tried the code below:

private void regex(String s){
        if(s.indexOf("error") >= 1){
            Toast.makeText(this, "Sorry the site failed again it's not my fault :(",
                       Toast.LENGTH_SHORT).show();
        }
        else{
            Pattern pattern = Pattern.compile("\t(.*?)\t_\tNN");
            Matcher matcher = pattern.matcher(s);
            System.out.println(s);
            if (matcher.find()) {
                String result = matcher.group(1);
                System.out.println(result);
            }
        }

    }

I am sure I got the pattern.compile string wrong.. it's not working seems it can't find the words I wanted..

Could anybody tell me how should I fix it?

P.S. About the tab character lookalike "/t", I actually printed the whole website as result, but when I get the result as a string I guess they become just a backslash and a "t" instead of still being tab characters.

1

There are 1 best solutions below

2
On BEST ANSWER

You can use the following:

"\\\\t([^\\\\]*?)\\\\t_\\\\tNN"

See Ideone Demo

See RegEx Demo