Why Ceph calculate PG ID by object hash rather than CRUSH algorithm?

315 Views Asked by At

Ceph using CRUSH algorithm for PG->OSD mapping and it works fine for increasing/decreasing of OSD nodes.

But for obj->PG mapping, Ceph still uses the traditional hash, which is pgid = hash(obj_name) % pg_num. This approach may lead to massive data migration if we change the number of PGs, even reduce the availability of the system.

Why Ceph doesn't use CRUSH algirhtm (say straw2) for obj->PG mapping which could have optimal amount of data migration when the number of PGs is changed?

1

There are 1 best solutions below

0
On

There are different scenarios and CRUSH is not a silver bullet I think.

  1. PG->OSD is a one-to-many function while obj->PG is a one-to-one function.
  2. Additions and deletions of OSD are fairly frequent, while PG is considered fairly stable.
  3. The OSD group could be partially unavailable while PG will not.

This is my perception, criticism or discussion is welcome.