I have a large XML
file and below is an extract from it:
...
<LexicalEntry id="Ait~ifAq_1">
<Lemma partOfSpeech="n" writtenForm="اِتِّفاق"/>
<Sense id="Ait~ifAq_1_tawaAfuq_n1AR" synset="tawaAfuq_n1AR"/>
<WordForm formType="root" writtenForm="وفق"/>
</LexicalEntry>
<LexicalEntry id="tawaA&um__1">
<Lemma partOfSpeech="n" writtenForm="تَوَاؤُم"/>
<Sense id="tawaA&um__1_AinosijaAm_n1AR" synset="AinosijaAm_n1AR"/>
<WordForm formType="root" writtenForm="وأم"/>
</LexicalEntry>
<LexicalEntry id="tanaAgum_2">
<Lemma partOfSpeech="n" writtenForm="تناغُم"/>
<Sense id="tanaAgum_2_AinosijaAm_n1AR" synset="AinosijaAm_n1AR"/>
<WordForm formType="root" writtenForm="نغم"/>
</LexicalEntry>
<Synset baseConcept="3" id="tawaAfuq_n1AR">
<SynsetRelations>
<SynsetRelation relType="hyponym" targets="AinosijaAm_n1AR"/>
<SynsetRelation relType="hyponym" targets="AinosijaAm_n1AR"/>
<SynsetRelation relType="hypernym" targets="ext_noun_NP_420"/>
</SynsetRelations>
<MonolingualExternalRefs>
<MonolingualExternalRef externalReference="13971065-n" externalSystem="PWN30"/>
</MonolingualExternalRefs>
</Synset>
...
I want to extract specific information from it. For a given writtenForm
whether from <Lemma>
or <WordForm>
, the programme takes the value of synset
from <Sense>
of that writtenForm
(same <LexicalEntry>
) and searches for all the value id
of <Synset>
that have the same value as the synset
from <Sense>
. After that, the programme gives us all the relations of that Synset
, i.e it displays the value of relType
and returns to <LexicalEntry>
and looks for the value synset
of <Sense>
who have the same value of targets
then displays its writtenForm
.
I think it's a little bit complicated but the result should be like this:
اِتِّفاق hyponym تَوَاؤُم, اِنْسِجام
One of the solutions is the use of the Stream reader because of the memory consumption. but I don't how should I proceed to get what I want. help me please.
The SAX Parser is different from DOM Parser.It is looking only on the current
item
it can't see on the future items until they become the currentitem
. It is one of the many you can use when XML file is extremely big . Instead of it there are many out there . To name a few:SAX
PARSERDOM
PARSERJDOM
PARSERDOM4J
PARSERSTAX
PARSERYou can find for all them tutorials here.
In my opinion after learning it go straight to use
DOM4J
orJDOM
for commercial product.The logic of
SAX
Parser is that you have aMyHandler
class which is extendingDefaultHandler
and@Overrides
some of it's methods:XML FILE:
Handler class:
Main Class :