PROLOG Make a list of words from big text file

152 Views Asked by At

I'm new in Prolog. I have a dictionary of 660000 lines (so a text file), where every word is in a different line, like this:

hello
everyone
thanks
for
help

I have to put these words into a list because I have to verify if user's input word is in this text file. So I should have something like this:

[hello, everyone, thanks, for, help]

I tried many solutions, but due to the huge number of words (I think), the program returns Fatal Error: global stack overflow. If the following can be a solution, is there a way to read a block of words from the file and if the user's word is not in this block, clear the stack and read another block of words till the end of file to avoid stack overflow error? I'm using GNU-Prolog, not SWI-Prolog, so I can't use predicates like read_line_to_string/2. Thank you for your help!

1

There are 1 best solutions below

0
On BEST ANSWER

Once you load the list of words from the file, how are you going to use it? Could it be easier to add the words as a table of facts? So, if your file is:

this
is
a
file

then you have a table of facts and you can query:

| ?- word(X).

X = this ? ;

X = (is) ? ;

X = a ? ;

X = file

(1 ms) yes

This might resolve your memory problem, too. Who knows....

But you still need to figure out how to read a line from input. On SWI-Prolog's website you see an implementation of "read line to codes"; why won't you use it? I copied it, compiled it, and it just works:

pl_read_line_to_codes(Stream, Codes) :-
    get_code(Stream, C0),
    (   C0 == -1
    ->  Codes0 = end_of_file
    ;   read_1line_to_codes(C0, Stream, Codes0)
    ),
    Codes = Codes0.

read_1line_to_codes(-1, _, []) :- !.
read_1line_to_codes(10, _, []) :- !.
read_1line_to_codes(13, Stream, L) :-
    !,
    get_code(Stream, C2),
    read_1line_to_codes(C2, Stream, L).
read_1line_to_codes(C, Stream, [C|T]) :-
    get_code(Stream, C2),
    read_1line_to_codes(C2, Stream, T).

Here is how I read the first line to an atom:

| ?- open(foo, read, In),
     pl_read_line_to_codes(In, Line),
     close(In),
     atom_codes(A, Line).

A = this
In = '$stream'(3)
Line = [116,104,105,115]

(1 ms) yes

Now, if you take the example code under the docs for read_line_to_codes/2, where it says "Backtrack over lines in a file", and just use that, replacing read_line_to_codes/2 with your definition of pl_read_line_to_codes/2:

stream_line(In, Line) :-
    repeat,
    (   pl_read_line_to_string(In, Line0),
        Line0 \== end_of_file
    ->  atom_codes(Line, Line0)
    ;   !,
        fail
    ).

it already works:

| ?- open(foo, read, In),
     \+ ( stream_line(In, W),
          \+ assertz(myword(W))
        ),
     close(In).

In = '$stream'(4)

yes
| ?- myword(X).

X = this ? ;

X = (is) ? ;

X = a ? ;

X = file

(1 ms) yes

Does it work for 660000 words? You should try it out. When I tried with ~700K words, GNU-Prolog complained that:

Fatal Error: Atom table full (max atom: 32768, environment variable used: MAX_ATOM)

I fixed this by starting GNU-Prolog like this:

$ MAX_ATOM=999999 gprolog

Reading all ~700K words takes about 2.5 minutes on my computer, so pretty slow.