Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEW] Multiple DB supports in cluster mode #1319

Open
madolson opened this issue Nov 19, 2024 · 11 comments
Open

[NEW] Multiple DB supports in cluster mode #1319

madolson opened this issue Nov 19, 2024 · 11 comments

Comments

@madolson
Copy link
Member

madolson commented Nov 19, 2024

When moving from Standalone to Cluster, there are two API changes that end users need to consider: cross-slot commands and moving from multiple DBs to a single database. Although the cross-slot requirement is a requirement in order to make sure Valkey clusters scale, there is no similar requirement for DBs. The decision to only support one Database was an optimization, mainly to simplify the key to slot mapping.

This feature was not added in Redis since the old core team considered using multiple databases to be an anti-pattern compared to using prefixes. For example, instead of using database 1 and 0, you could have all keys prefixed with 0:: and 1:: and then build ACLs on top of that.

This use case works, but has some drawbacks. One common workload is loading in a fresh dataset into a secondary database and then performing a SWAPDB operation and then an async flush on the old data.

We will have some technical difficulty with implementing multiple databases now with the introduction of dict per slot, since we would have to duplicate all of the structure for each dictionary.

@wuranxx
Copy link

wuranxx commented Nov 19, 2024

Does this issue expect to support multiple databases in cluster mode?

I think this is a valuable feature. In production environments, many customers are accustomed to using the multi-DB feature. When migrating from standalone mode to a clustered setup, they expect the provider to offer this capability.

@madolson madolson changed the title [NEW] Multiple DB supports [NEW] Multiple DB supports in cluster mode Nov 19, 2024
@madolson
Copy link
Member Author

Does this issue expect to support multiple databases in cluster mode?

Yes, thanks for you commenting. I made this as a placeholder to follow up about, so haven't fully added the details yet.

@hpatro
Copy link
Contributor

hpatro commented Nov 20, 2024

I think this is a valuable feature. In production environments, many customers are accustomed to using the multi-DB feature. When migrating from standalone mode to a clustered setup, they expect the provider to offer this capability.

@wuranxx Do they use it as a multi tenant setup? I think we should also consider supporting first party ACL support for multiple DBs.

@murphyjacob4
Copy link
Contributor

the ability to flush or swap DBs

Although not currently implemented, we could use DBs as an abstraction to allow users to control and monitor individual workloads that are isolated from one another but collocated on the same Valkey cluster. I think it is a common use case to have the same cluster hosting data for many workloads/microservices. Maybe databases are an easier avenue to solving this use case than just prefixes?

  • It should be possible for the engine to collect per-DB stats (e.g. number of keys, memory footprint, etc). A command like DBINFO <db_num> could make it easy for users to monitor their server when they have many applications/use cases.
  • We could support certain configurations on a per DB level. E.g. perhaps you could configure maxmemory per DB with eviction to support isolation between workloads? Or maybe you might want to enable certain settings on one DB but not another?
    • In a cluster context, you could lock a DB to be hashed to a single slot (e.g. maybe its DB number/ID). For workloads that require cross-key commands, it could be a way without needing to manage hashtags in the client (logically, it is basically doing the same thing but with some syntactical sugar).

Sidenote: if we want to reverse direction on databases, I would like to float the idea of replacing the set number of DBs and the concept of a DB number (0-15) with a map from DB name to DB so users could create as many DBs as they want and name them how they please. By default, maybe DBs 0-15 are there, but perhaps you could DBCREATE <db_name> for additional DBs.

@wuranxx
Copy link

wuranxx commented Nov 20, 2024

I think this is a valuable feature. In production environments, many customers are accustomed to using the multi-DB feature. When migrating from standalone mode to a clustered setup, they expect the provider to offer this capability.

@wuranxx Do they use it as a multi tenant setup? I think we should also consider supporting first party ACL support for multiple DBs.

Since redis/valkey has never implemented ACL control for databases, customers have not raised related requirements.

I believe that adding ACL support for databases would be a much larger requirement, and it’s necessary to reconsider the role of databases within valkey. Since databases have traditionally been regarded as an anti-pattern, there has been relatively little discussion on this topic.

@hpatro
Copy link
Contributor

hpatro commented Nov 20, 2024

Although not currently implemented, we could use DBs as an abstraction to allow users to control and monitor individual workloads that are isolated from one another but collocated on the same Valkey cluster. I think it is a common use case to have the same cluster hosting data for many workloads/microservices. Maybe databases are an easier avenue to solving this use case than just prefixes?

I was suggesting the use case which @murphyjacob4 has called out. Different type of workloads on a single cluster using multiple DBs. This would warrant separate ACL rules for different workloads.

@zuiderkwast
Copy link
Contributor

+1 on this feature. It would remove one of the few the differences between cluster and standalone.

@zuiderkwast
Copy link
Contributor

ACL for DB numbers sounds good too but this is orthogonal to this feature I believe. No dependencies between the two.

@hpatro
Copy link
Contributor

hpatro commented Nov 21, 2024

ACL for DB numbers sounds good too but this is orthogonal to this feature I believe. No dependencies between the two.

Thought of bringing it up as the stance had always been we don’t want to support multiple DBs, hence, acl support for multi dbs don’t need to be built. Let me file a separate issue to discuss about it.

@madolson
Copy link
Member Author

Sidenote: if we want to reverse direction on databases, I would like to float the idea of replacing the set number of DBs and the concept of a DB number (0-15) with a map from DB name to DB so users could create as many DBs as they want and name them how they please. By default, maybe DBs 0-15 are there, but perhaps you could DBCREATE <db_name> for additional DBs.

I long ago had thoughts on this somewhere in Redis, but they are probably lost to time. I really like this idea though. I mostly have been calling them namespaces, but we could call them databases as well (I'm still going to call them namespaces here though). By default everything is placed into the default "0" namespace. There is no explicit "create a namespace", you can simply just call SELECT my_namespace and it will be created. We would add a new ACL for namespaces for which ones you can select in to, like $my_namespace. I agree with the idea of having each namespace configurable for stuff like eviction policy, but maybe also defaults like TTL or triggers valkey-io/valkey-rfc#9.

For cluster mode, I think it makes more sense to have namespaces be a clusterwide context instead of have them exist on a single shard. Cluster mode is inherently built to scale, constraining something to just one shard seems like a poor way to scale.

One of the reasons I want to differentiate namespaces, is that I think people already have assumptions about DBs that I don't really want to change.

@roshkhatri
Copy link
Member

+1 to these features, adding multiple db support would make it possible to migrate from standalone to cluster mode, I also like the idea of having db as namespaces along with ACL support. This might make customers life easier of maintaining one cluster for different microservices/workloads with ACL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants