Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm chart for monolithic and read-write deployment mode #4832

Open
rubenvw-ngdata opened this issue Apr 25, 2023 · 19 comments
Open

Helm chart for monolithic and read-write deployment mode #4832

rubenvw-ngdata opened this issue Apr 25, 2023 · 19 comments
Labels

Comments

@rubenvw-ngdata
Copy link

Is your feature request related to a problem? Please describe.

There is currently only a helm chart available for the full microservices deployment mode of Grafana Mimir. This is pretty exhaustive and results in a lot of pods. Ideally there would be an alternative to this.

Describe the solution you'd like

An separate helm chart or a deployment mode configuration in the chart to distinguish the deployment mode (could be in a similar way as what is available for loki).
Ideally the alternative deployment solution also supports multi-AZ (where we are running one instance in each AZ)

Describe alternatives you've considered

The only alternative now is to run a minimalistic version of the mimir-distributed helm chart

Additional context

See also previous ticket on grafana/helm-charts: grafana/helm-charts#1189

@rubenvw-ngdata
Copy link
Author

I had some time to work on this, so I did a try to get this functionality myself (but I failed to get it fully working)

See PR #4858 (I know it is not ready, but sharing it, so you can help me on it)

@dimitarvdimitrov
Copy link
Contributor

Thank you for the proposal and the draft PR. I appreciate the time spent. We've been experimenting with the two deployment modes and would like to explore them further as alternatives to microservices mode (maybe even "at scale"). We're not quite there yet, but these deployment modes are also not being deprecated soon.

However, there are some considerations we have to take into account before adding different deployment modes to the helm chart. A couple that come to mind now:

  • How do we reduce code duplication in the helm chart? Helm isn't super friendly to functions and reducing code duplication. Making named templates more complex comes with a readability tradeoff. With time we will have to add features to multiple deployment modes, which will slow down contributions.
  • How do we test these deployment modes? Currently we commit some golden records, install a handful of configurations for smoke tests, and run OPA policies. We should probably invest in making sure these tests exist in some form for non-microservices deployments; this is unclear at present and needs some thought.
  • How do we document them? Do we need to change some of the existing docs for the helm chart to make sure they aren't outdated? Or maybe we need new docs dedicated to the different deployment modes
  • Do we provide a zero-downtime migration path between the different deployment modes?

Most of these aren't trivial to answer and there will probably be divided opinions. At the same time we, at Grafana Labs, don't have much visibility into how much read-write or monolithic deployment modes will be used or how much they can scale.

As much as I hate to say it, keeping this functionality in a fork will be more pragmatic as it stands. You can publish the forked chart under a different name and we can track how much usage it gets. With time we can revisit and incorporate the changes in the mimir-distributed chart and share the maintenance efforts.

@rubenvw-ngdata
Copy link
Author

Hi @dimitarvdimitrov ,

Thanks for your answer. I'm a bit disappointed though that you propose to leave it on a fork branch.

The most important reason to use mimir for us (and I don't think we are alone) is to make prometheus HA. With the microservices configuration this comes at a high maintenance level with a very fine grained configuration.

I understand that there are various things that you should think about when embedding it into the product; that's also why this is just a draft.

Have you been able to check the error message I was facing with the monolithic setup? I'm willing to continue, rename the chart and maintain the fork for the time being, but I could use a bit of help debugging through the issues that I'm facing (I don't know a lot the mimir internals).

@dimitarvdimitrov
Copy link
Contributor

The most important reason to use mimir for us (and I don't think we are alone) is to make prometheus HA. With the microservices configuration this comes at a high maintenance level with a very fine grained configuration.

With the helm chart we are aiming to make this configuration less of a hassle. The defaults in the chart should work for most users. In addition to that monolithic and read-write deployments have the same configuration options as microservices. However, I can see how scaling up/out a microservices deployment is more complicated than scaling a monolithic deployment.

I left a comment on the draft PR wrt the "connection refused" error. I'm happy to help with answers when I can.

@WoodyWoodsta
Copy link

WoodyWoodsta commented Aug 21, 2023

To add my two cents, since Grafana Loki already has the "read-write" mode and the helm chart for it, I was sort of expecting to be able to deploy Mimir in the same way if it contains the same component architecture (which is does). So I'm wondering if the considerations listed above are not the equivalents of what has already been done in Grafana Loki?

@davinkevin
Copy link
Contributor

Monolithic mode is a very important (strategic?) deployment model IMO, because it makes able to start simple with it, and then increase the complexity if the product fits our needs.

ATM, without the monolithic mode, I don't see me deploying mimir or tempo in clusters I manage "just for evaluation purpose"… and so I start to look at other tool, even if I already run loki & grafana.

As a user, I don't expect any SLA or validation from this chart flavour, just a parameter to deploy it in "target=all".

@rubenvw-ngdata
Copy link
Author

@davinkevin If you want to try out mimir in monolithic deployment mode, you can use our fork at https://github.com/NGDATA/mimir.
Currently we only do internal releases, so if you want to use it, you will have to take care of the release process yourself.

The more usages of the fork, the more likely it gets that this gets embedded in the product.

@mhoyer
Copy link

mhoyer commented Oct 28, 2023

I like the idea of providing one ore more less complex helm chart solutions for mimir. Why? Because we also tried to deploy the current mimir-distributed one and it was really though to walk through the values.yaml. Sure, the chart probably would have run out of the box, but a) we had to apply some modifications and b) my inner nerd wants to know what I am deploying. And here I didn't even look into the templates.

The complex mimir-distrubuted helm chart definitely has it's use case for larger production deployments. Though, the more simple rollout methods are valuable too. For beginners, but also for scenarios with lower performance requirements.

As the almost 4k lines long values.yaml is already overwhelming I suggest to really split up into separate helm charts before adding even more complexity to the existing one (with deployment method). This makes your lifes as maintainers easier and the ones of the consumers too, because they can decide upfront which sophisticated kind of helm chart to start with. In fact, they just have to deal with less complex values.yaml and may understand how the templates work (in case of an issue).

Regarding the sharing of common template functions you may follow a similar approach like Bitnami with a mimir-common helm chart? See https://github.com/bitnami/charts/tree/main/bitnami/common

@davinkevin
Copy link
Contributor

@rubenvw-ngdata is the fork still maintained?

@rubenvw-ngdata
Copy link
Author

It is, we are using it without issues. We do not follow all changes that happen on main immediately though. If there is something that is not working for you, let me know.

@Ca-moes
Copy link

Ca-moes commented Aug 12, 2024

Having a monolithic deployment for the helm chart would be awesome for the meta-monitoring chart

@lieberlois
Copy link

Is there any update on this? I really don't understand the decision to have the simplescalable variant for loki but not for mimir 😓

@rorynickolls-skyral
Copy link

rorynickolls-skyral commented Sep 13, 2024

This would be a useful feature where Mimir needs to be deployed for testing. We currently test our observability stack in CI and Mimir, even in a minimal distributed setup, consumes a lot of resources.

Loki can easily just run in SingleBinary mode for tests, and I had assumed the two would be configurable in the same way.

@gclawes
Copy link

gclawes commented Sep 25, 2024

This would be a useful feature where Mimir needs to be deployed for testing. We currently test our observability stack in CI and Mimir, even in a minimal distributed setup, consumes a lot of resources.

Loki can easily just run in SingleBinary mode for tests, and I had assumed the two would be configurable in the same way.

Seconded. This would also be useful in small-footprint/homelab deployments.

@kmdlcp
Copy link

kmdlcp commented Oct 3, 2024

Thirded. If that is even a word.

This is really a must have.

@Robsta86
Copy link

Robsta86 commented Oct 9, 2024

Fourthed, if that’s even a word. ;-)

I am surprised to see Mimir still only has the microservices mode available to use via the helm chart, unlike the other deployment options that Loki has. Would like to run mimir simple scalable in my lab for testing.

@ravenolf
Copy link

It would definitely be a useful option especially for smaller clusters or just for experimenting with Mimir before a migration from another Prometheus-like tool

@bennesp
Copy link

bennesp commented Dec 9, 2024

It would definitely be a useful option especially for smaller clusters or just for experimenting with Mimir before a migration from another Prometheus-like tool

This is definitely my use case too, I want to explore Mimir (with OtelCol) as a replacement for Prometheus, but the only way to do it right now is by using the microservices helm chart that is not really convenient.

@fiddeb
Copy link

fiddeb commented Dec 20, 2024

+1 on the idea of monolithic deployments in the Helm chart for Mimir.

Currently, creating a test environment for an LGTM stack is challenging without relying on Prometheus. Even Grafana’s own docker-otel-lgtm project uses Prometheus, highlighting the gap in Mimir’s usability for such scenarios.

In my own lab environment that i building, I am forced to use Prometheus instead of Mimir to conserve resources when working with tools like Rancher Desktop. This setup creates a road bump for me and prevents me from experimenting with tenants for metrics, which is a key feature I would like to explore.

A monolithic Helm chart for Mimir would significantly enable a broader range of use cases. It would make Mimir more accessible for teaching observability concepts, onboarding new team members, and running sandbox environments for development teams.

This also aligns with the deployment modes available for Loki, providing a consistent and simplified developer experience across the LGTM stack. I hope this feature is reconsidered, as it would be an enabler for both education and development in compact or resource-limited environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests