What determines which strings are interned and when?

2.2k Views Asked by At
>>> s1 = "spam"
>>> s2 = "spam"
>>> s1 is s2
True
>>> q = 'asdalksdjfla;ksdjf;laksdjfals;kdfjasl;fjasdf'
>>> r = 'asdalksdjfla;ksdjf;laksdjfals;kdfjasl;fjasdf'
>>> q is r
False

How many characters should have to s1 is s2 give False? Where is limit? i.e. I am asking how long a string has to be before python starts making separate copies of it.

3

There are 3 best solutions below

4
On BEST ANSWER

String interning is implementation specific and shouldn't be relied upon, use equality testing if you want to check two strings are identical.

6
On

If you want, for some bizarre reason, to force the comparison to be true then use the intern function:

>>> a = intern('12345678012345678901234567890qazwsxedcrfvtgbyhnujmikolp')
>>> b = intern('12345678012345678901234567890qazwsxedcrfvtgbyhnujmikolp')
>>> a is b
True
0
On

Here is a piece of comment about interned string from CPython 2.5.0 source file (stringobject.h)

/* ... ... This is generally restricted to strings that **"look like" Python identifiers**, although the intern() builtin can be used to force interning of any string ... ... */

Accordingly, strings contain only underscores, digits or alphabets will be interned. In your example, q and ``r contain ;, so they will not be interned.