Java Scanner to find a tag, then delimiters to write what's in that tag to a file

168 Views Asked by At

I'm writing a program that's intended to search the HTML of a website, find a specific tag, then write the contents of that tag to a file. For example, the HTML could look like this:

<div class="something" specific-tag:"print this 1">some content</div>
<div class="something" not-the-right-tag:"don't print this">some content</div>
<div class="something" specific-tag:"print this 2">some content</div>
<div class="something" not-the-right-tag:"don't print this">some content</div>
<div class="something" specific-tag:"print this 3">some content</div>

The desired file output would look like this:

print this 1
print this 2
print this 3

I know how to use the Scanner class to find the specific tag, in this case "specific-tag" and I know how write to a file using delimiters, the delimiter in this case being ", but what I don't know how to do is search for a tag, then write to a file everything between the delimiters after that tag, then resume searching for the next tag and repeat until the end of the file.

Thoughts?

1

There are 1 best solutions below

1
On BEST ANSWER

You really should use some kind of html parsing library. A quick google search revealed this http://jsoup.org/. It seems easy to use. Calling

Elements divs = doc.select("div[specific-tag]");

should yield the divs and then you can extract the specific-tag attribute.