I'm writing a program that's intended to search the HTML of a website, find a specific tag, then write the contents of that tag to a file. For example, the HTML could look like this:
<div class="something" specific-tag:"print this 1">some content</div>
<div class="something" not-the-right-tag:"don't print this">some content</div>
<div class="something" specific-tag:"print this 2">some content</div>
<div class="something" not-the-right-tag:"don't print this">some content</div>
<div class="something" specific-tag:"print this 3">some content</div>
The desired file output would look like this:
print this 1
print this 2
print this 3
I know how to use the Scanner class to find the specific tag, in this case "specific-tag" and I know how write to a file using delimiters, the delimiter in this case being ", but what I don't know how to do is search for a tag, then write to a file everything between the delimiters after that tag, then resume searching for the next tag and repeat until the end of the file.
Thoughts?
You really should use some kind of html parsing library. A quick google search revealed this http://jsoup.org/. It seems easy to use. Calling
should yield the divs and then you can extract the specific-tag attribute.