The core RCU APIs in the Linux kernel applies to all clients in the kernel, which means any reader (even if they are accessing totally unrelated data structures) accessing rcu-backed data will be treated equally. And calls like synchronize_rcu() needs to wait for all readers, even if they are accessing entirely unrelated data structures under the hood.
Why is it that the Linux kernel never added the support for a per data object RCU? Am I missing anything here? I think the implication of the current RCU APIs is that if there are a lot of clients in the kernel, the overall performance of RCU may suffer since they share a global view.
No, this is wrong implication. RCU implementation in the Linux kernel is perfectly scalable for the number of "clients".
You want to "replace" the single "lock object", used for RCU, with multiple lock objects, so protection of different data could use different lock object. But the RCU implementation does NOT use any lock object at all!
Because of that, RCU implementation is quite complex and uses inner details of the Linux kernel (e.g. scheduler), but this is worth of it. E.g.
rcu_read_lock
andrcu_read_unlock
work much faster than any sort ofspin_lock
, because of absence of memory contention with other cores.Actually, "lock object" are used for sleepable version of RCU (sRCU). See e.g that article.