I have 2 huge XML files (4-5 Gb each). XML format is as follows:
<root>
<item>
<id/>
<elements/>
<elements/>
<elements/>
</item>
</root>
I need to compute whether more <items>
have been added or modified! For this I am planning to sort the two files and then proceed from there. To sort, I have following two approaches in mind.
Convert XML files to other format and perform external sort.
Sort using XSLT: I am not sure whether it could be done for such huge files.
I would like to know which of the two approaches is feasible for the problem.
Or if there is a better approach to tackle the problem.
EDIT: I cannot load the entire file on disk, So using "diff" or "bdiff" is not an option.