Ceph using CRUSH algorithm for PG->OSD mapping and it works fine for increasing/decreasing of OSD nodes.
But for obj->PG mapping, Ceph still uses the traditional hash, which is pgid = hash(obj_name) % pg_num
. This approach may lead to massive data migration if we change the number of PGs, even reduce the availability of the system.
Why Ceph doesn't use CRUSH algirhtm (say straw2) for obj->PG mapping which could have optimal amount of data migration when the number of PGs is changed?
There are different scenarios and CRUSH is not a silver bullet I think.
This is my perception, criticism or discussion is welcome.