Why NPTL threading in Linux still assignee unique PID to each thread?

409 Views Asked by At

I am reading pthread man and seeing following:

With NPTL, all of the threads in a process are placed in the same thread group; all members of a thread group share the same PID.

My current architecture is running on NPTL 2.17 and when I run htop that is showing threads I see that all PIDs are unique. But why? I am expecting some of them (e.g. chrome) sharing same PID with each other?

enter image description here

4

There are 4 best solutions below

0
On

The Linux kernel does have the concept of POSIX pids (explorable in /proc/*) but it calls them thread group ids in the kernel source and it refers to its internal thread ids as pids (explorable in /proc/*/task/*).

I believe this is rooted in Linux's original treatment of threads as "just processes" that happen to share address spaces and a bunch of other stuff with each other.

Your user tool is likely propagating this perhaps confusing Linux kernel terminology.

0
On

Because kernel-level threads are no more than processes with the (nearly) same address space.

This was "solved" by the linux kernel development by renaming them the processes to "threads", the "pid"-s to "tid"-s, and the old processes became "thread groups".

However, the sad truth is that if you create the thread on Linux (clone()), it will create a process - only using the (nearly) same memory segments.

That means 1:1 thread model. It means that all the threads are actually kernel-level threads, meaning that they are essentially processes in the same address space.

Some other alternatives would be:

  • 1:M thread model. It means that the kernel doesn't know about threads, it is the task of the user-space libraries to make an "in-process multitasking" to run appearantly multi-threaded.
  • N:M thread model. This is best, unfortunately some opinion favorize still 1:1. It would mean that we have both user- and kernel-level threads and some optimization algorithm decides, what to run and where.

Once Linux had an N:M model (ngpt), but it was removed on a yet another fallback. It was that Linux kernel calls are inherently synchronous (blocking). Resulting that some kernel-cooperation had been needed even for user-space synchronization. Nobody wanted to do that.

So is it.

P.s. to create a well-performant app, you should actually avoid to create a lot of threads at once. You need to use a thread pool with well-thought locking protocols. If you don't minimize the usage of the thread creations/joins, your app will be slow and ineffective, it doesn't matter if it is N:M or not.

0
On

See man gettid:

gettid() returns the caller's thread ID (TID). In a single-threaded process, the thread ID is equal to the process ID (PID, as returned by getpid(2)). In a multithreaded process, all threads have the same PID, but each one has a unique TID. For further details, see the discussion of CLONE_THREAD in clone(2).

What htop shows is TID, not PID. You can toggle display of the threads on/off with H key.

You can also enable PPID column in htop and that shows the PID / TID of the main thread for threads.

2
On

Google's documentation for Chromium (which probably operates similarly to Chrome when it comes to these concepts) states that they use a "multi-process architecture". Your quote from pthread's man page states that all of the threads in a single process are placed under the same PID, which would not apply to Chrome's architecture.