I aim to establish a role-based access control system utilizing Keycloak's authorization framework.
As overseers of a payment platform, we handle over 200 permissions, catering to around 300 000 merchants.
Each merchant has the capacity to define up to 10 roles. Assuming all merchants create the maximum 10 roles, this would result in a cumulative total of 3 million roles within Keycloak.
Approach 1: Creating a policy for each role
Example: Let's assume I have permissions P1, P2………. P200. As a merchant I create a role R1 with permissions P1 and P2.
- Create role R1 in keycloak
- Create policy R1_POLICY with role R1
- Attach policy R1_POLICY to permissions P1 and P2.
Problem:
This approach, when coupled with a substantial user base (300k merchants), results in a massive number of roles and policies. Consequently, this leads to UI performance issues for the permission detail page in Keycloak. The evaluation API in the first approach struggles to handle the load, as it needs to evaluate all policies attached with permission. This becomes a bottleneck, impacting the system's overall performance.Approach 2: Creating a policy for each permission.
Description: In this approach, each permission will have a policy. When a merchant will create a role, the role will be created in keycloak then attach the role to policy.
Problem:
When creating a role with a specific permission, we need to fetch the policy associated with that permission. Subsequently, we fetch all roles connected to that policy (which can be up to 3 million roles) and then update the policy by adding the newly created role to it. While this reduces the overall number of policies, it introduces a significant challenge when evaluating policies based on resources and scopes for a user. This results in having potentially 3 million roles attaching to a single policy at most. This could lead to potential bottlenecks and slowdowns in the system.
I am new to Keycloak. Is my analysis correct? Can someone please suggest a scalable approach for RBAC using KeyCloak if the above approaches do not get fit for this use case?