I have some troubles understanding how can I download only part of html page. I tryed traditional way through URL::openStream
method and BufferedReader
but I'm not quite sure if this way pushes me to download whole page.
The problem is: I have quite big HTML page and I need to parse 2 numbers from it, which updating at least once a second. Way above helps to detect changes once in 2-3 seconds and I wonder if there is way to make it faster. So I thought if fetching page partly can help me.
Fetch HTML part in java
275 Views Asked by Vlad Doronin At
2
There are 2 best solutions below
0

Wrote helper to read url content. Parser for elements in another class.
public class HTMLReaderHelper {
private final URL currentURL;
HTMLReaderHelper(URL url){
currentURL = url;
}
public CharIterator charIterator(){
CharIterator iterator;
try {
iterator = new CharIterator();
} catch(IOException ex){
return null;
}
return iterator;
}
public StringIterator stringIterator(){
return new StringIterator();
}
class CharIterator implements java.util.Iterator<Character>{
private InputStream urlStream;
private boolean isValid;
private Queue<Character> buffer;
private CharIterator() throws IOException {
urlStream = currentURL.openStream();
isValid = true;
buffer = new ArrayDeque<>();
}
@Override
public boolean hasNext() {
char c;
try {
c = (char)urlStream.read();
buffer.add(c);
} catch (IOException ex) {
markInvalid();
return false;
}
return c != (char) -1;
}
@Override
public Character next() {
if(!isValid){
return null;
}
char c;
try {
if(buffer.size() > 0){
return buffer.remove();
}
c = (char)urlStream.read();
} catch (IOException ex) {
markInvalid();
return null;
}
return (c != (char)-1) ? c : null;
}
private void markInvalid(){
isValid = false;
}
}
class StringIterator implements java.util.Iterator<String>{
private CharIterator charPointer;
private Queue<String> buffer;
private boolean isValid;
private StringIterator(){
charPointer = charIterator();
isValid = true;
buffer = new ArrayDeque<>();
}
@Override
public boolean hasNext() {
String value = next();
try {
buffer.add(value);
} catch (NullPointerException ex){
markInvalid();
return false;
}
return isValid;
}
@Override
public String next() {
if(buffer.size() > 0){
return buffer.remove();
}
if(!isValid){
return null;
}
StringBuilder sb = new StringBuilder();
Character currentChar = charPointer.next();
if(currentChar == null){
return null;
}
while (currentChar.equals('\n') || currentChar.equals('\r')){
currentChar = charPointer.next();
if(currentChar == null){
return null;
}
}
while (currentChar != Character.valueOf('\n') && currentChar != Character.valueOf('\r')){
sb.append(currentChar);
currentChar = charPointer.next();
}
return sb.toString();
}
private void markInvalid(){
isValid = false;
}
}
}
I think you should see how the data is fetched (SSE or WebSocket) and just try to subscribe to that service. If that is impossible try more efficient XML parser. I recommend https://vtd-xml.sourceforge.io/ it can be ~10x faster then DOM parser that comes with JDK.
Also be careful with the
BufferedReader.readLine()
as there is a hidden cost of allocation (this is pretty advanced stuff as you have to think about CPU memory bandwidth, L1 cache misses etc..) for the strings that you don't really need.Example using the library I mentioned: