Referential Arrays in Python

1.6k Views Asked by At

I have started learning Data Structures and Algorithms. Please help me with my doubt.

  1. Is it ok to say that lists, tuples, and dictionaries are a type of referential array?

  2. I was going through the example in the book where it was written that we want a medical information system to keep track of the patients currently assigned to beds in a certain hospital. If we assume that the hospital has 200 beds, and conveniently that those beds are numbered from 0 to 199, we might consider using an array-based structure to maintain the names of the patients currently assigned to those beds. For example, in Python, we might use a list of names, such as:

[ Rene , Joseph , Janet , Jonas , Helen , Virginia , ... ]

To represent such a list with an array, Python must adhere to the requirement that each cell of the array use the same number of bytes. Yet the elements are strings, and strings naturally have different lengths. Python could attempt to reserve enough space for each cell to hold the maximum length string (not just of currently stored strings, but of any string we might ever want to store), but that would be wasteful. Instead, Python represents a list or tuple instance using an internal storage mechanism of an array of object references. At the lowest level, what is stored is a consecutive sequence of memory addresses at which the elements of the sequence reside. A high-level diagram of such a list is shown in Figure

https://i.stack.imgur.com/7FffP.jpg

What I understood is that python stores the memory address of "Rene" or "Joseph". But memory addresses will also change with respect to number of characters in the name like each Unicode takes 2 bytes of space.

Now it was also written that "Although the relative size of the individual elements may vary, the number of bits used to store the memory address of each element is fixed (e.g., 64-bits per address i.e 8 bytes). What if the character is very long and it's not able to store the memory address in 64 bits?

1

There are 1 best solutions below

0
On

Is it ok to say that lists, tuples, and dictionaries are a type of referential array?

Lists are (though it might matter that they have dynamic resizing built in), but tuples and dictionaries have a lot more going on behind the scenes.

What I understood is that python stores the memory address of "Rene" or "Joseph". But memory addresses will also change with respect to number of characters in the name like each Unicode takes 2 bytes of space.

The memory address is an integer describing where to find the data, and the data itself is a string of bytes like b'Rene'. I'm not sure what CPython uses to figure out where a data structure ends, but common approaches include a null terminator (when given an address, all bytes that you find are part of the data structure till you hit a null byte), ensuring every object is the same size so that you can skip ahead (not feasible for strings), or having the first byte(s) describe the length of the rest of the object. The address itself has a fixed size, e.g., 64 bits.

In the following illustration, the "address" used in the referential array would be 3, and you would know that the last byte of the string happened at address 6 by walking till you found the null byte (the other approaches I mentioned work similarly). The important thing is that 3 is the only number the referential array needs to store to be able to reference b'Rene'.

|3|4|5|6|7 |
|R|e|n|e|\0|

What if the character is very long and it's not able to store the memory address in 64 bits?

This was actually the reason 32-bit systems couldn't use more than 4GB of RAM -- they only had 32-bit identifiers to access the RAM (without additional techniques like assigning each process a RAM offset so that each process has a 32-bit memory space). Modern 64-bit systems allow 16 exabytes of RAM to be indexed without needing any other special techniques -- one of the primary motivations for the 32 -> 64 bit migration.

In any case though, memory limits are a real thing, and if we start needing to address more memory than that we'll need to use more complicated strategies than fixed-width integer references, or we'll need to jump up again and use bigger integers.