Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make max_connections configurable via the charm #359

Closed
nishant-dash opened this issue Feb 15, 2024 · 24 comments
Closed

make max_connections configurable via the charm #359

nishant-dash opened this issue Feb 15, 2024 · 24 comments
Assignees
Labels
bug Something isn't working

Comments

@nishant-dash
Copy link

Steps to reproduce

  1. deploy landscape 23.03 LTS on an env with a lot of landscape clients
  2. relate postgresql to it

In the setup, landscape makes a lot of connections and we need to increase the max connections beyond the default 100

Expected behavior

N/A

Actual behavior

N/A

Versions

Operating system: DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"

Juju CLI: 2.9.46

Juju agent: 2.9.46

Charm revision: 351

LXD: N/A

Log output

Juju debug log: N/A

Additional context

Landscape logs

Feb 15 16:27:12 pingserver-2 CRIT  #012Traceback (most recent call last):#012  File "/usr/lib/python3/dist-packages/storm/exceptions.py", line 165, in wrap_exceptions
#012    yield#012  File "/usr/lib/python3/dist-packages/storm/databases/postgres.py", line 438, in raw_connect#012    return self._raw_connect()#012  File "/usr/lib/p
ython3/dist-packages/storm/databases/postgres.py", line 419, in _raw_connect#012    raw_connection = ConnectionWrapper(psycopg2.connect(self._dsn), self)#012  File "/
usr/lib/python3/dist-packages/psycopg2/__init__.py", line 122, in connect#012    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)#012psycopg2.Op
erationalError: connection to server at "10.130.13.84", port 5432 failed: FATAL:  remaining connection slots are reserved for non-replication superuser connections#01
2#012#012The above exception was the direct cause of the following exception:#012#012Traceback (most recent call last):#012  File "/usr/lib/python3/dist-packages/twis
ted/python/threadpool.py", line 244, in inContext#012    result = inContext.theWork()  # type: ignore[attr-defined]#012  File "/usr/lib/python3/dist-packages/twisted/
python/threadpool.py", line 260, in <lambda>#012    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]#012  File "/usr/lib/python3/dist-packages/
twisted/python/context.py", line 117, in callWithContext#012    return self.currentContext().callWithContext(ctx, func, *args, **kw)#012  File "/usr/lib/python3/dist-
packages/twisted/python/context.py", line 82, in callWithContext#012    return func(*args, **kw)#012  File "/usr/lib/python3/dist-packages/storm/twisted/transact.py",
 line 78, in _wrap#012    result = function(*args, **kwargs)#012  File "/opt/REDACTED/landscape/REDACTED/landscape/pingserver/pingserver.py", line 122, in check_for
_messages#012    for name, account_store in get_account_stores():#012  File "/opt/REDACTED/landscape/REDACTED/landscape/model/account/store.py", line 21, in get_acc
ount_stores#012    yield (name, zstorm.get(name))#012  File "/usr/lib/python3/dist-packages/storm/zope/zstorm.py", line 179, in get#012    return self.create(name, de
fault_uri)#012  File "/usr/lib/python3/dist-packages/storm/zope/zstorm.py", line 155, in create#012    store = Store(database)#012  File "/usr/lib/python3/dist-packag
es/storm/store.py", line 83, in __init__#012    self._connection = database.connect(self._event)#012  File "/usr/lib/python3/dist-packages/storm/database.py", line 58
4, in connect#012    return self.connection_factory(self, event)#012  File "/usr/lib/python3/dist-packages/storm/database.py", line 270, in __init__#012    self._raw_
connection = self._database.raw_connect()#012  File "/usr/lib/python3/dist-packages/storm/databases/postgres.py", line 437, in raw_connect#012    with wrap_exceptions
(self):#012  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__#012    self.gen.throw(typ, value, traceback)#012  File "/usr/lib/python3/dist-packages/st
orm/exceptions.py", line 184, in wrap_exceptions#012    six.raise_from(wrapped.with_traceback(tb), e)#012  File "<string>", line 3, in raise_from#012  File "/usr/lib/
python3/dist-packages/storm/exceptions.py", line 165, in wrap_exceptions#012    yield#012  File "/usr/lib/python3/dist-packages/storm/databases/postgres.py", line 438
, in raw_connect#012    return self._raw_connect()#012  File "/usr/lib/python3/dist-packages/storm/databases/postgres.py", line 419, in _raw_connect#012    raw_connec
tion = ConnectionWrapper(psycopg2.connect(self._dsn), self)#012  File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 122, in connect#012    conn = _conne
ct(dsn, connection_factory=connection_factory, **kwasync)#012storm.database.OperationalError: connection to server at "REDACTED", port 5432 failed: FATAL:  remain
ing connection slots are reserved for non-replication superuser connections     
@nishant-dash nishant-dash added the bug Something isn't working label Feb 15, 2024
Copy link
Contributor

@nishant-dash
Copy link
Author

this is an important config for operators and the fact that it was removed is quite strange... (this option was there in the older postgresql charms)

@taurus-forever
Copy link
Contributor

taurus-forever commented Feb 16, 2024

Dear @nishant-dash , thank you for your ticket!
Let me bring some light here:

  • The Data team concept is: auto-tune all the necessary parameters IF you have necessary resources. Avoid user mistakes. You can discuss it with @delgod and @7annaba3l.

  • As you can see autotunning is available for MySQL(-K8s) charms: https://charmhub.io/mysql/docs/r-profiles but somehow lost for PostgreSQL: https://charmhub.io/postgresql/docs/r-profiles . Boom. We should definitely address this.

  • You are right, at the moment the modern postgresql charm mimics the legacy charm defaults 100 connections. Which should be a value for profile=testing and autotuned for max value (for available HW resources) in profile=production. @marceloneppel please add this ticket to the next/this pulse please.

  • the charm config max_connections was NOT removed but not implemented as the modern charm is complete re-implementation from zero of the legacy charm. The config max_connections was also deprecated in the legacy charm (frankly speaking replaced with extra_pg_conf which is not yet implemented in the modern charm).

Thank you for noticing and reporting this issue!

@nishant-dash
Copy link
Author

Ok, I see.
So what does that mean for configuration of this value? Will it be via max_connections or a generic extra_pg_conf?

@taurus-forever
Copy link
Contributor

@7annaba3l we need to update (your?) decision for not exposing max_connections in the spec. Should we?

@nishant-dash
Copy link
Author

any update on this? we need to roll out landscape to multiple clouds and this is blocking it unfortunately. Our existing deployments are breaking until we cowboy patch the charm to make max connections higher

@taurus-forever
Copy link
Contributor

taurus-forever commented Mar 13, 2024

@nishant-dash, copy here from MM (to share with a team).
... after the initial discussing this story internally (which is still ongoing)...:

  • high number of idle connections to PostgreSQL will noticeably affect DB performance (as internally PG forks on each connection).
  • to address the issue above official PG recommendation: use pgbouncer.
    BTW, data team provides pgbouncer charm for both VM and K8s.
  • consider to install pgbouncer by default for your use-case? It will automatically avoid mac_connections=100 problem you are experiencing and will also give you better scalability and performance.

Feel free to check https://www.percona.com/blog/scaling-postgresql-with-pgbouncer-you-may-need-a-connection-pooler-sooner-than-you-expect/ for more details about PG connections and PGB.
We have updated our official documentation to represent this recommendation.

I will keep the bugreport opened until you confirm pgbouncer is OK for your use-case. Tnx!

@nishant-dash
Copy link
Author

Hi @taurus-forever
@err404r tried it and hit https://bugs.launchpad.net/landscape-charm/+bug/2057907
we have engaged with the landscape team right now

In the meantime, we will try redeploying landscape with pgbouncer in between from the beginning

@dragomirp
Copy link
Contributor

Hi, I'm not sure pgbouncer will work out of the box, since landscape uses multiple databases.

@Perfect5th
Copy link

Perfect5th commented Mar 14, 2024

Hi, I'm not sure pgbouncer will work out of the box, since landscape uses multiple databases.

This is correct. In my testing, pgbouncer charm does not work out-of-the-box because of the multiple (six) database requirements of landscape-server. Either we need support for multiple databases in pgbouncer, or direct support for configuring max_connections. What's the quickest route here?

@nobuto-m
Copy link

nobuto-m commented Apr 1, 2024

Just wanted to mention that auto tuning of database configurations with multiple parameters is hard. The mysql(-k8s) charm logic is also in question.
canonical/mysql-operator#407
It's nearly impossible to meet a business need with multiple parameter problems with zero (or just one) config option.

@taurus-forever
Copy link
Contributor

It is an provisional response: Data Team is actively working to add pgbouncer compatibility for Landscape (multiple databases support). It should solve the lack of max_connections issues.

With the proper luck amount we will share PoC this week, please stay tuned. Tnx!

CC: @Perfect5th @nobuto-m

@dragomirp
Copy link
Contributor

Hi, @nishant-dash, @Perfect5th, There is a development version of our pgbouncer charm on 1/edge/autodb that should be able to deploy within landscape-server. I used {"max_db_connections": "20", "pool_mode": "transaction"} as pgbouncer configuration. You should be able to deploy it with:

juju deploy pgbouncer --channel 1/edge/autodb --series jammy
juju integrate pgbouncer:db-admin landscape-server

Please give it a try.

@nishant-dash
Copy link
Author

Hey @dragomirp ,

Thanks for that! How do you propose we deploy that to an existing setup of an application consuming postgresql (since we'll have to effectively cut the relation between the app, landscape server in this case, and postgres)?

@dragomirp
Copy link
Contributor

Hi, @nishant-dash, this is a development branch for testing, I wouldn't recommend deploying that to production environment before it hits at least edge.

Once we have confidence that pgbouncer can solve our issues, deployments that need it will have to be rerelated with pgbouncer in between.

@nishant-dash
Copy link
Author

Hello @dragomirp , (and also @Perfect5th )

Its not about deploying it to production but about testing it. The way I see it, the most accurate set of steps to test this would be

  1. Deploy landscape and postgres 14/stable
  2. Show that max connections are an issue
  3. Switch to a pgbouncer setup
  4. Show that max connections are not an issue

This is because for a variety of IN-USE deployments, we need to do these steps. To that end I am asking about step 3, is there a safe way to do it? I can obviously redeploy the entire landscape stack with pgbouncer from the beginning but I would like to avoid that if possible.

@dragomirp
Copy link
Contributor

Hi, @nishant-dash, the pgbouncer charm implements the same interfaces as the postgresql charm. Step 3 would involve removing the relation between postgresql and landscape, relate pgbouncer and postgresql and relate pgbouncer and landscape on the db-admin interface. You may need to stop landscape manually to release existing connections. You can take a look at this test for an example.

@Perfect5th
Copy link

I did some light manual testing that I based primarily on the tests written here and other steps proposed in the issue. Things looked pretty good to me, in that I can confirm that I wasn't hitting the max-connections issues that I'd seen without pgbouncer.

I didn't do any sort of testing under large load, but I did enough interactions to I think reasonably conclude that this is working well. I did need to pause and resume the landscape services manually to get it using pgbouncer.

@taurus-forever
Copy link
Contributor

taurus-forever commented May 4, 2024

@Perfect5th thank you for the good news! Can you please share more details about: I did need to pause and resume the landscape services manually to get it using pgbouncer? My understanding is: landscape charm should reload (pause/resume) internal services on peer data update event. In this case postgresql and pgbouncer behaves identically on our side.

Separate topic, please include the next roadmap a migration from legacy inteface pgsql to the modern one postgresql_client. The legacy interface pgsql will be removed in the future 24.04-based PostgreSQL 16 charm!
You can find more details here:

@Perfect5th
Copy link

Perfect5th commented May 5, 2024

@taurus-forever sorry I'm not exactly sure what happened, but I can try to reproduce later.

After changing the relation to use pgbouncer, the landscape-server application went into the Blocked status. Pausing and resuming it got it back into Active. It's possible it was working fine, already using pgbouncer, and went into Blocked for some other reason - I didn't check before running those actions.

I'll make sure Landscape has it in our roadmap to stop using the legacy interface.

@nishant-dash
Copy link
Author

on another note @taurus-forever ,
If I have have only a standalone postgresql deployment for a customer to use, how would they integrate their external non juju workload to work with this new pgbouncer (in the context of max connections)?

@nobuto-m
Copy link

The config option name as experimental_max_connections is tricky because it's destined to be renamed once it's no longer experimental.
#472

That means a working bundle or Terraform plan will be invalidated at some point. Could we rename it to something like max_connections_override so the current testing for the experimental future (in the charm, not the software) will keep working?

I left a similar comment to MySQL one.

canonical/mysql-operator#429 (comment)

I'd like to challenge the config option name. I totally understand that it's experimental and it could be removed at any point so clarifying it in the config description is fine.

However, adding the prefix experimental- to the config option itself is a little bit awkward. Because the max-connections in MySQL itself is a quite known config and it's not experimental or anything like that. The idea behind the config naming is to indicate that "this option could disappear anytime once the charm gets capable to make the best decision" so I prefer a different name something like max-connections-override or something like that to indicate "while charm will try to make the best decision to configure max connections automatically, you can still override the value explicitly".

@taurus-forever
Copy link
Contributor

Dear @nobuto-m , re: max_connections_override, please discuss it directly with PM Team. CC: @7annaba3l

@taurus-forever
Copy link
Contributor

Dear @nobuto-m , please discuss the option naming with @7annaba3l directly.

The option experimental_max_connections will not be removed suddenly.
In the same time we are still planning/hoping to improve the stale connections
usage significantly by creating connections auto-negotiation.
However this is a way bigger story we will work in https://warthogs.atlassian.net/browse/DPE-5130

I am resolving this bug-report as the config option has been created and released to 14/stable.
Feel free to reopen it anytime with any further questions. Tnx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants