I have a large XML file and below is an extract from it:
...
<LexicalEntry id="Ait~ifAq_1">
<Lemma partOfSpeech="n" writtenForm="اِتِّفاق"/>
<Sense id="Ait~ifAq_1_tawaAfuq_n1AR" synset="tawaAfuq_n1AR"/>
<WordForm formType="root" writtenForm="وفق"/>
</LexicalEntry>
<LexicalEntry id="tawaA&um__1">
<Lemma partOfSpeech="n" writtenForm="تَوَاؤُم"/>
<Sense id="tawaA&um__1_AinosijaAm_n1AR" synset="AinosijaAm_n1AR"/>
<WordForm formType="root" writtenForm="وأم"/>
</LexicalEntry>
<LexicalEntry id="tanaAgum_2">
<Lemma partOfSpeech="n" writtenForm="تناغُم"/>
<Sense id="tanaAgum_2_AinosijaAm_n1AR" synset="AinosijaAm_n1AR"/>
<WordForm formType="root" writtenForm="نغم"/>
</LexicalEntry>
<Synset baseConcept="3" id="tawaAfuq_n1AR">
<SynsetRelations>
<SynsetRelation relType="hyponym" targets="AinosijaAm_n1AR"/>
<SynsetRelation relType="hyponym" targets="AinosijaAm_n1AR"/>
<SynsetRelation relType="hypernym" targets="ext_noun_NP_420"/>
</SynsetRelations>
<MonolingualExternalRefs>
<MonolingualExternalRef externalReference="13971065-n" externalSystem="PWN30"/>
</MonolingualExternalRefs>
</Synset>
...
I want to extract specific information from it. For a given writtenForm whether from <Lemma> or <WordForm>, the programme takes the value of synset from <Sense> of that writtenForm (same <LexicalEntry>) and searches for all the value id of <Synset> that have the same value as the synset from <Sense>. After that, the programme gives us all the relations of that Synset, i.e it displays the value of relType and returns to <LexicalEntry> and looks for the value synset of <Sense> who have the same value of targets then displays its writtenForm.
I think it's a little bit complicated but the result should be like this:
اِتِّفاق hyponym تَوَاؤُم, اِنْسِجام
One of the solutions is the use of the Stream reader because of the memory consumption. but I don't how should I proceed to get what I want. help me please.
The SAX Parser is different from DOM Parser.It is looking only on the current
itemit can't see on the future items until they become the currentitem. It is one of the many you can use when XML file is extremely big . Instead of it there are many out there . To name a few:SAXPARSERDOMPARSERJDOMPARSERDOM4JPARSERSTAXPARSERYou can find for all them tutorials here.
In my opinion after learning it go straight to use
DOM4JorJDOMfor commercial product.The logic of
SAXParser is that you have aMyHandlerclass which is extendingDefaultHandlerand@Overridessome of it's methods:XML FILE:
Handler class:
Main Class :