IndriUI Index not building

237 Views Asked by At

I am trying to build index using Indri UI. I created parameter files and stopword lists for building the index. When I click build index, the UI keeps building for long time and the index is never built.

enter image description here

UI hangs here,

enter image description here

Here is my input.txt file,

<DOC>
<DOCNO>
@switcheery
</DOCNO>
<TEXT>
Lol?"@elsidi01: "@switcheery: God bless that man that loves to see me happy......"#I"
</TEXT>
<DOCNO>
@Roseefly
</DOCNO>
<TEXT>
42% of Irish People have a Medical Card/Doctor Only Card. ##I have to admit we are a great little country #budget15 #healthcare
</TEXT>
<DOCNO>
@FammySaulkner
</DOCNO>
<TEXT>
@dthompsonRTS11 @Kirkpatrick_29 gosh dev you read my mind #I??crossfit
</TEXT>
<DOCNO>
@codesilence
</DOCNO>
<TEXT>
data mine the heart..for ??    #nsa  #i
</TEXT>
<DOCNO>
@ulidovmj
</DOCNO>
<TEXT>
Now That's What I Call Club Hits 2014: http://t.co/kd2xE5GZhq #nowalbum #album #ukcharts #uscharts #trending #i... http://t.co/tGe9wH6M0e
</TEXT>
<DOCNO>
@ulidovmj
</DOCNO>
<TEXT>
Now That's What I Call Club Hits 2014: http://t.co/kd2xE5GZhq #nowalbum #album #ukcharts #uscharts #trending #i... http://t.co/BmMMpLHcVA
</TEXT>
<DOCNO>
@ulidovmj
</DOCNO>
<TEXT>
Now That's What I Call Club Hits 2014: http://t.co/kd2xE5GZhq #nowalbum #album #ukcharts #uscharts #trending #i... http://t.co/GyuzOVA68T
</TEXT>
<DOCNO>
@ulidovmj
</DOCNO>
<TEXT>
Now That's What I Call Club Hits 2014: http://t.co/kd2xE5GZhq #nowalbum #album #ukcharts #uscharts #trending #i... http://t.co/sCw5U1DXMy
</TEXT>
<DOCNO>
@ulidovmj
</DOCNO>
<TEXT>
Now That's What I Call Club Hits 2014: http://t.co/kd2xE5GZhq #nowalbum #album #ukcharts #uscharts #trending #i... http://t.co/JwhqJoSN1T
</TEXT>
<DOCNO>
@SandySchmitz3
</DOCNO>
<TEXT>
Having kids is the biggest leap of faith a person can make. 2 create new lives & hope they spread goodness throughout the world. #I WISH
</TEXT>
<DOCNO>
@my_15minutes
</DOCNO>
<TEXT>
wubba lubba dub dub means I'm in great pain, please help me by winning the #I'dbemortyfied contest on @TheMarySue
</TEXT>
<DOCNO>
@darren1966h
</DOCNO>
<TEXT>
I managed to finish the Cheshire welcomes you! assignment! Try it for yourself! http://t.co/NYCrn7DQTu #GameInsight #iPad #i...
</TEXT>
<DOCNO>
@GomitasYnutella
</DOCNO>
<TEXT>
Set de fotos: dee-lirious: #i regret every day of my life i didn’t love you http://t.co/Z48py9uOOC
</TEXT>
<DOCNO>
@PernelleBdt
</DOCNO>
<TEXT>
"Un seul être vous manque et tout est dépeuplé." 
Ma plus belle étoile, mon plus beau souvenir.. 3 ans déjà.. #I #14102011  #memories ??
</TEXT>
<DOCNO>
@news8martha
</DOCNO>
<TEXT>
The 2.7 inches of rain that's fallen in La Crosse would translate to 27 inches of snow!
#I'll top complaining now!
</TEXT>
</DOC>

Here is my stopwords.txt,

<parameters>
<stopper> 
<word>happy</word>
<word>wondeful</word>
<word>sad</word>
<word>cute</word>
</stopper>
</parameters>

Am I missing something? Please help me with this and I am new to IR.I have no idea about the parameter file. I created one and I am not sure where it is used.

1

There are 1 best solutions below

0
On

What I did for stopword list, I simply write each word in each line without any tags. Also what I think is the correct way for TRECTEXT format is having each document in one tag of <DOC></DOC> and then inside this tage put the </DOCNO> and </TEXT> tag. For example:

<DOC>
<DOCNO>
@switcheery
</DOCNO>
<TEXT>
Lol?"@elsidi01: "@switcheery: God bless that man that loves to see me happy......"#I"
</TEXT>
</DOC>
<DOCNO>
@Roseefly
</DOCNO>
<TEXT>
42% of Irish People have a Medical Card/Doctor Only Card. ##I have to admit we are a great little country #budget15 #healthcare
</TEXT>
</DOC>