I wonder why this is not terminating in GNU Smalltalk:
s := Set new. s add: s
In theory, s should be simply a set containing an empty set. But executing that just loops forever, blowing up the heap.
Interestingly,
((s := Set with: 4 with: 5 with: 6) add: s) size. terminates and evaluates to 4.
An introduction
A
Setis a kind ofHashedCollectionspecifically designed for fast membership check. Internally aSethas aHashTable, a sparce array with many empty slots to minimize the number of collisions. When we#add:an element to theSet, anindexto theHashTableis calculated as(hash \\ size) + 1, where#\\is the mod operation andsizeis the length of the table. If the slot atindexis already occupied, theindexis incremented until a free slot is found, and the element is stored there (N.B.,+ 1because Smalltalk arrays are1-based.) [cf. How does Set actually work]Our case
Now, let's see what happens with the code of the question:
As described above, in step 2,
add: swill compute:where
pis the initial number of slots ofs's inner table (a prime number, set to5or7when the set is first created, and increased later as needed.)So far, so good. But there might be some
Issues
Where? Depending on the Smalltalk dialect, there might be an issue with
#printOn:or with the nextadd:ition of an element.The printing issue
A problem that might happen with
#printOn:is infinite recursion. When printings, one would want to print its elements as well, and in our case such an approach would recursively try to printsin the process, creating an endless circularity.To prevent this from happening,
Pharouses aLimitedWriteStreamthat will stop writing after a certain number of iterations, breaking the recursion, should it exist.I haven't checked it myself, but this is the issue that seems to be happening in GNU Smalltalk (according to the question.)
Note that it is not enough to print only a max number of elements in the
Set. In fact, our set only contains one element (itself), which is enough to create the recursion.The addition issue
As observed by @aka.nice in his comment, care must also be taken when adding
sa second time. Why? Because, as we have seen in the introduction above, the messageadd: swill need to computes hash …. And how iss hashdefined? This is an interesting question. Smalltalkers are routinely confronted with the problem of implementing a good#hashin some classes. Sincesis a collection, it is tempting to take thehashof its elements into account for the final result, right? Pharo takes this approach, look:which swallows the bait. Why? Because the set
shas 1 element (itself), so the conditionself size <= 10istrue, and the method will try to recursively computes hashagain, leading to, oh oh, Stack Overflow.Conclusions
Be careful when implementing
Collection >> #printOn:. Even if the collection doesn't contain itself, a recursion might arise if there is an indirect reference from one of the elements back to the collection containing it.Be careful when implementing
Collection >> #hash(same reasons)Be careful when adding a
Collectionto itself. More generally, be careful when a collection contains an element with a (possibly indirect) reference to it, because enumerating such a collection may be tricky.In Mathematics (the Science), a Set cannot be an element of itself (this is explicitly forbidden by the Axioms of Set Theory). So, re-think your model before violating this rule that comes from an extremely mature and evolved scientific body of knowledge.