Multi tenancy
On a SaaS data analytics platform, it is common for multiple tenants, such as organizations, customers, or business units, to share the same database infrastructure while maintaining logical separation of their data. This allows different users to securely access their own data within the same platform.
Depending on the requirements, there are different ways to implement multi-tenancy. Below is a guide on how to implement them with ClickHouse Cloud.
Shared table
In this approach, data from all tenants is stored in a single shared table, with a field (or set of fields) used to identify each tenant’s data. To maximize performance, this field should be included in the primary key. To ensure that users can only access data belonging to their respective tenants we use role-based access control, implemented through row policies.
We recommend this approach as this is the simplest to manage, particularly when all tenants share the same data schema and data volumes are moderate (< TBs)
By consolidating all tenant data into a single table, storage efficiency is improved through optimized data compression and reduced metadata overhead. Additionally, schema updates are simplified since all data is centrally managed.
This method is particularly effective for handling a large number of tenants (potentially millions).
However, alternative approaches may be more suitable if tenants have different data schemas or are expected to diverge over time.
In cases where there is a significant gap in data volume between tenants, smaller tenants may experience unnecessary query performance impacts. Note, this issue is largely mitigated by including the tenant field in the primary key.
Example
This is an example of a shared table multi-tenancy model implementation.
First, let's create a shared table with a field tenant_id included in the primary key.
Let's insert fake data.
Then let's create two users user_1 and user_2.
We create row policies that restricts user_1 and user_2 to access only their tenants' data.
Then GRANT SELECT privileges on the shared table using a common role.
Now you can connect as user_1 and run a simple select. Only rows from the first tenant are returned.
Separate tables
In this approach, each tenant’s data is stored in a separate table within the same database, eliminating the need for a specific field to identify tenants. User access is enforced using a GRANT statement, ensuring that each user can access only tables containing their tenants' data.
Using separate tables is a good choice when tenants have different data schemas.
For scenarios involving a few tenants with very large datasets where query performance is critical, this approach may outperform a shared table model. Since there is no need to filter out other tenants’ data, queries can be more efficient. Additionally, primary keys can be further optimized, as there is no need to include an extra field (such as a tenant ID) in the primary key.
Note this approach doesn't scale for 1000s of tenants. See usage limits.
Example
This is an example of a separate tables multi-tenancy model implementation.
First, let's create two tables, one for events from tenant_1 and one for the events from tenant_2.
Let's insert fake data.
Then let's create two users user_1 and user_2.
Then GRANT SELECT privileges on the corresponding table.
Now you can connect as user_1 and run a simple select from the table corresponding to this user. Only rows from the first tenant are returned.
Separate databases
Each tenant’s data is stored in a separate database within the same ClickHouse service.
This approach is useful if each tenant requires a large number of tables and possibly materialized views, and has different data schema. However, it may become challenging to manage if the number of tenants is large.
The implementation is similar to the separate tables approach, but instead of granting privileges at the table level, privileges are granted at the database level.
Note this approach doesn't scale for 1000s of tenants. See usage limits.
Example
This is an example of a separate databases multi-tenancy model implementation.
First, let's create two databases, one for tenant_1 and one for tenant_2.
Let's insert fake data.
Then let's create two users user_1 and user_2.
Then GRANT SELECT privileges on the corresponding table.
Now you can connect as user_1 and run a simple select on the events table of the appropriate database. Only rows from the first tenant are returned.
Compute-compute separation
The three approaches described above can also be further isolated by using Warehouses. Data is shared through a common object storage but each tenant can have its own compute service thanks to compute-compute separation with different CPU/Memory ratio.
User management is similar to the approaches described previously, since all services in a warehouse share access controls.
Note the number of child services in a warehouse is limited to a small number. See Warehouse limitations.
Separate Cloud service
The most radical approach is to use a different ClickHouse service per tenant.
This less common method would be a solution if tenants data are required to be stored in different regions - for legal, security or proximity reasons.
A user account must be created on each service where the user can access their respective tenant’s data.
This approach is harder to manage and bring overhead with each service, as they each requires their own infrastructure to run. Services can be managed via the ClickHouse Cloud API with orchestration also possible via the official Terraform provider.
Example
This is an example of a separate service multi-tenancy model implementation. Note the example shows the creation of tables and users on one ClickHouse service, the same will have to be replicated on all services.
First, let's create the table events
Let's insert fake data.
Then let's create two users user_1
Then GRANT SELECT privileges on the corresponding table.
Now you can connect as user_1 on the service for tenant 1 and run a simple select. Only rows from the first tenant are returned.