Replies: 0 comments 14 replies
-
@sn0wcat @stefan-ettl this might be of interest to you |
Beta Was this translation helpful? Give feedback.
-
@kbData how about creating a concept for this, maybe together with @sn0wcat and @stefan-ettl (since you already mentioned them)? Would definitely not be a priority work imho, as other issues are more urgent and need to be addressed by the dev team. But a (conceptual) contribution to this topic would be very welcome and would be a good starting point to deal with it in the future or even better to find more interested people to work on it. |
Beta Was this translation helpful? Give feedback.
-
@mspiekermann we will provide a proposal. //cc: @kbData |
Beta Was this translation helpful? Give feedback.
-
Motivation for the MultitenancyAs @kbData stated, the data space participants who are not ready to operate their own instance would use different "Connector as a Service" offerings which would operate the service for them. At the moment the participant agent is represented through it's own instance of the EDC Runtime - there is no differentiation between the two. This way of thinking works well for the "self-hosting" companies which will operate one or two instances, however this becomes significantly more complicated for all "Connector as a Service" scenarios (which include also integration into existing multitenant products). In order to provide this service at an affordable price (see also the calculation provided by @kbData ), multitenancy support in the EDC Runtime would be appreciated. Beside the price, the operation concerns, like updates, upgrades, security checks etc. are significantly easier to handle compared to the multi-instance scenarios. Multitenant EDC as a ServiceIn a multitenant scenario a customer would use a "managed EDC endpoint" (identified through the URL, self-description, catalog etc.) which would be isolated through authentication checks from the other tenants while using the same infrastructure. For this the EDC would need to be "tenant-aware", i.a. it should be able to decode the current tenant information from the authentication token and it should be able to manage the state of the request dependent on it (e.g. present the catalog of the required tenant dependent on the authentication information. e.g. if we have e.g. 1000 participants whose EDC endpoints are represented with URLs like
we would only have to scale the number of running instances according to load: Multi-Instance EDC as a ServiceAnother alternative is to offer the EDC as a Service but in a multi-instance scenario: every EDC participant gets its own managed instance of the EDC runtime with its own persistence, which means that for the said 1000 participants which are purchasing the software, the 1000 DB instances + 1000 Runtime Instances (+ gateways, certificates etc.) should be spawned. Even though a system like this can be automated, it produces significantly more cost, heat, entropy, climate change affecting gases ;) and overall more overhead in the management then the other scenario: Scheduling the EDC instances on-demandThere was an idea mentioned by @MoritzKeppler that the EDC instances could be started on demand and terminated according to the need. However it is not quite clear how this would work with long running workflows of EDC - especially if the EDC assumes that it is always running during the workflow execution. We are still ready to look into the multitenancy concept as we are convinced that it is the best way to operate "EDC as a Service", however I am getting the feeling that this is conceptually against the philosophy of EDC core development team. cc:// @mspiekermann, @MoritzKeppler, @stefan-ettl, @paullatzelsperger @alexandrudanciu and @kbData |
Beta Was this translation helpful? Give feedback.
-
Let me state upfront a few things:
Let's start with the design principles that underly the EDC, as we must be clear on those at the outset. EDC Design Principles: Participant Agents and IdentityA participant agent is a software system that performs a specific operation or role in a dataspace. Currently, the following are participant agent types:
There will be more in the future. A fundamental design principle of EDC is that a participant agent is associated with one, and only one, identity. We often refer to an "EDC runtime" as an instantiation of a participant agent. Hence, a runtime is associated with one participant identity. There are several key nuances that follow from the above design. First, process boundaries are purposely not specified by the above architecture. They are a deployment design decision, and the EDC can support many diverse deployment topologies. For example, a participant agent could be deployed as a However, it does not follow that a participant agent must be deployed in its own process. Multiple runtimes, each configured as distinct participant agents, may be deployed within the same process "space." There are many ways to do that, which we can cover in detail. The EDC design supports a myriad of operational requirements, and I have not seen any that would require us to rethink our approach. To state this slightly differently: you can likely already do what you want by adopting the "EDC as a platform" approach and its architecture to create a service offering that fits your requirements. EDC as a Service Deployment PossibilitiesIf you would like to deploy and manage EDC as a service for multiple organizations in a "dense" operational environment, I see at least two possibilities. Namely, a container-first approach that relies on the Kubernetes ecosystem and a second that is a bespoke implementation. If I were designing for this type of operational environment, my personal preference would be to leverage the Kubernetes ecosystem since the infrastructure it provides addresses many of the challenges associated with such a complex environment. However, your requirements or preferences may differ. The Container-First ApproachThis approach is quite simple: run each EDC as a ReplicaSet and configure Kubernetes to perform routing, isolation, and other requirements. The ReplicaSet can be scheduled to run on a shared compute infrastructure. There is no need to run multiple gateways—map URLs to different Kubernetes ingress points. Similarly, you don't necessarily need separate "heavyweight database instances." You could opt to:
One of the advantages of this approach is that it opens the possibility of leveraging Kubernetes' rich management and DevOps ecosystem instead of having to roll your own. Another advantage of this approach concerns data sovereignty. In this approach, data does not have to be "collocated" in the same process space as the Kubernetes ecosystem can be used to implement isolation at various levels (e.g., containers, network, etc.). Finally, it should be noted that Kubernetes-based systems can also be made to run in constrained and cost-efficient environments. Roll Your OwnIf the previous approach is not an option, you can recreate some (but not all) of the capabilities the Kubernetes ecosystem already provides with EDC extensions. Returning to EDC design principles, I mentioned it is possible to instantiate multiple EDC participant agents ("runtimes") in the same process space. The EDC JUnit launcher does exactly this. The EDC contains a lightweight core (no unnecessary dependencies, no application frameworks, etc.) that is memory-efficient and compact. Leverage that and build a multiplexer launcher by doing the following. 1. Create a launcherThe launcher is responsible for loading multiple runtimes and managing their lifecycle. A launcher could support a static configuration mechanism or a dynamic one where runtimes are created and destroyed based on an external events. 2. Create a multiplexing-aware
|
Beta Was this translation helpful? Give feedback.
-
So to summarize, EDC doesn't have "Container as a Service" scenarios in the EDC platform in the focus so there is no need for any contributions to EDC which would go into this direction and the recommendation is to go with the development of the external management system or derived product of the EDC and the engineering efforts can be directed towards that? |
Beta Was this translation helpful? Give feedback.
-
First, thanks to @sn0wcat for clarifying and visualizing the requirements that make this discussion possible. Thanks also to @jimmarino for the detailed explanation and recommendation. To add my perspective on that, I would like to highlight the part of "providing a platform that can enable those scenarios". It is important for me to clarify this, as it would frame your statement "so there is no need for any contributions to EDC which would go into this direction". If initiatives that use the EDC (in the various operating models) encounter limitations that require changes in the architecture of the EDC and the above described platform approach, this is exactly the contribution that we hope for and want to work together on a solution within the OSS project. |
Beta Was this translation helpful? Give feedback.
-
Hello!
As far as I know, there were plans to support such a concept as "Connector as a Service" (let it be CaaS) for those clients who can't / don't want to run their own instance of the Connector. This is especially true for SMEs (may even have no IT department). Now, those companies operating CaaS for SMEs will handle multiple customers, potentially thousands of them (according for example to CATENA-X KPI goals).
It is economically too expensive to host an instance of Connector per Customer. The costs just for the resources would go over 1k$/year (estimation) - too expensive for an SME, and operational costs come on top.
We would like to have the Multitenancy feature for the EDC connector, meaning, single instance of EDC can serve multiple customers, with customer data being separated across customers (tenant separation concept). Let us discuss this idea here.
With best regards
Kiryl
Beta Was this translation helpful? Give feedback.
All reactions