Lets suppose I have a xml documents in which i can find links to the other documents of the same type which can also have a links to another one documents. At starting point I have list of documents to read and analise. I have written following algorithm to read and analise those documents:
private static List<String> documentNames = new ArrayList<String>();
main(...) {
//add names to documentNames arrayList above.
for(String documentName : documentNames) {
readDocument(documentName);
}
}
Function readDocument looks following:
private static CopyOnWriteArrayList<String> visitURL(String documentName) {
CopyOnWriteArrayList<String> visitedDocs = new CopyOnWriteArrayList<String>(); //visited Ref urls
if (!visitedDocs .contains(documentName)) {
analyseAndWriteOnDisk(documentName) //it saves analised document on disk
CopyOnWriteArrayList<String> tmp = visitURL(documentName);
visitedDocs.addAll(tmp);
} else {
System.out.println(documentName " - I have seen it !");
}
return visitedDocs;
}
It works, but after execution of the programm I can find duplicate files (files with the same content). I shouldnt have them - I prevent it by if-condition in function visitURL. My question is: what doesn't work here ? I suppose that something is wrong with with manipulation with array visitedDocs. How can I get on every recursion call actuall version of array with already visited files ?
Being as most precise as I can, I have a recursion function which operates on some collection X:
recursion(CollectionType X) {
someoperations(X)
recursion(X)
}
and X must be always actual.
Each time you call
visitURL, you're creating a new instance ofvisitedDocs. So, it's empty every time at the beginning of the call, and at the end contains only the current iteration oftmp.According to JavaDocs, you need to call the new one like this:
CopyOnWriteArrayList<String> visitedDocs = new CopyOnWriteArrayList<String>(documentNames) //here you need to add the parameter of the ArrayList you want to copy, otherwise you're instantiating a blank ArrayList.Then, you'll need to set your
documentNamesequal to the returnedvisitedDocs.