Why does the Linux kernel use circular doubly linked lists to store the lists of processes?

2.5k Views Asked by At

The Linux kernel stores the lists of processes in a circular doubly linked lists, called the task list. What is the reason behind it? Why was circular doubly linked lists used? What is the advantage of using this data structure? What were the creators trying to achieve by using this data structure?

4

There are 4 best solutions below

0
On BEST ANSWER

Flexibility, so if you know for example what you're searching for is probably behind you you can use the list_for_each_entry_reverse macro instead of the usual forward one.

"you use linked lists when iterating over the whole list is important and the dynamic addition and removal of elements is required ... Using this type of linked list provides the greatest flexibility"

and no duplication of code

"In the old days, there were multiple implementations of linked lists in the kernel. A single, powerful linked list implementation was needed to remove duplicate code. During the 2.1 kernel development series, the official kernel linked-list implementation was introduced."

Source: Robert Love. "Linux Kernel Development" (3rd Edition). p.87-94

2
On

In most cases, the task list is not used at all. The kernel reaches task structures through e.g. a pointer in the thread_info structure (and it gets to the latter through the stack pointer) or through current_task, a global pointer to (ahem) the current task.

It is only when the task list needs to be traversed (a rare and inherently inefficient operation because the task list is of variable length, so it can grow to hundreds of thousands or even millions of entries on a large machine) that the linked list is used. In that case, it is convenient to start at any point in the list and keep going until you get back to the place where you started from (i.e. a circular list). Why it is doubly linked, I don't know for sure, but I imagine the link only adds a few bytes to an already large structure and the flexibility of traversing it either way was worth that. EDIT: actually, as @thrig points out in another answer, the reason is that there is one implementation of linked lists in the kernel that everybody uses (no weird bugs because somebody used their own) and that happens to be a doubly-linked list implementation, precisely because some users of it will want the flexibility.

0
On

The reason to have a list of objects (such as processes) some form is that sometimes the kernel needs to enumerate all these objects, i.e. to go through each of them in turn. This means there must be a way to find all the objects of this type. If objects can be created and removed one at a time then a linked list is the simplest solution.

The list needs to be doubly linked in order to support removing an object. When removing an object, the code needs to update all the pointers that point to this object. Therefore the object needs to contain a pointer to all the other objects that point to it (or at least there needs to be a chain of pointers starting at the object itself). With a singly-linked list, to remove B from A→B→C, there's no way to discover the A whose pointer needs to be updated, short of going through all objects until the right one is found. With a doubly-linked list, to remove B from A↔B↔C, you follow the pointer from B to A and change A's pointer to B to point to C instead, and likewise with C.

0
On

As Gilles points out, sometimes the kernel needs to loop through all the processes (for example, when handling kill(-1, sig), which sends sig to every process for which the calling process has permission to send signals, except for process 1 — see kill(2)).  I haven’t looked at this code in a long time; but, in the early versions of Unix, the process table was an array — some fixed number of contiguous proc structs (sometimes called “process slots”).  For example, the look-at-all-processes loop might have looked something like this:

for (i = 0; i < PROC_MAX; i++)          // This would probably really have started with i = 1
{                                       // so as not to look at proc[0], i.e., PID 1.
    if (proc[i].flags & IS_AN_ACTIVE_PROCESS)
        do something with the process
}

so it would have had to look at PROC_MAX (potentially thousands) process slots just to find ones that had an actual process in them — possibly a much smaller number.  Putting the processes into a linked list allows the kernel to do something with each process without have to search through a cavernous process table.

Also, if everything is held together with linked lists, it becomes feasible to have a dynamically sized process table.  If/when the initial default static process table gets full, just allocate some more (non-contiguous) memory and link them together.


P.S. I believe that there are probably multiple process lists:

  • an overall one,
  • one for processes that are runnable (in a “run” state),
  • one for processes that are actually currently running on (at least) one processor,
  • one for processes that are waiting for an event (e.g., completion of an I/O request),
  • etc.

In the Dark Ages (30 years ago), whenever a user pressed Enter (called Return back then), the kernel might have had to search through the 1000-entry-long process table to find the process(es) waiting for input from that tty.