Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It's unclear what connectionOptions is for #267

Open
tomjaguarpaw opened this issue Sep 5, 2020 · 5 comments
Open

It's unclear what connectionOptions is for #267

tomjaguarpaw opened this issue Sep 5, 2020 · 5 comments

Comments

@tomjaguarpaw
Copy link
Collaborator

Config contains a field called connectionOptions but it's unclear it is for, or at least I can't see how to use it successfully. For example, I thought that I could set connectionOptions to choose a dbname:

> Right db <- startConfig defaultConfig { connectionOptions = mempty { dbname = pure "mydbname" } }
> toConnectionString db
"host=/tmp/tmp-postgres-socket-8892564f dbname=mydbname port=44901"

Although that appears to work no such database called mydbname exists:

$ psql "host=/tmp/tmp-postgres-socket-8892564f dbname=mydbname port=44901"
psql: FATAL:  database "mydbname" does not exist

It seems that the right way to choose a dbname is to use optionsToConfig.

> Right db <- startConfig $ optionsToConfig mempty { dbname = pure "mydbname" }
> toConnectionString db
"host=/tmp/tmp-postgres-socket-4d018f9b dbname=mydbname port=56889"

Then psql shows that the database called mydbname does indeed exist.

$ psql "host=/tmp/tmp-postgres-socket-4d018f9b dbname=mydbname port=56889"
psql (11.7 (Debian 11.7-0+deb10u1))
Type "help" for help.
mydbname=#

(optionsToDefaultConfig also works in place of optionsToConfig)

Is it supposed to be possible to set a dbname by modifying the connectionOptions of a Config? I spent about an hour trying to do so unsuccessfully. The fact that toConnectionString returned a string containing the correct dbname, but that dbname didn't actually exist, sent me down the wrong track. Does my inability to get this to work indicate a bug or have I failed to conjure up the right recipe? I tried to understand the intention of the API by looking at the implementation but I got bamboozled by the Last Monoid stuff.

(Tangentially, even with dbname set a database called postgres still gets created. I don't much care about that, but does it indicate that something has gone wrong somewhere internally?)

$ psql "host=/tmp/tmp-postgres-socket-4d018f9b dbname=mydbname port=56889"
psql (11.7 (Debian 11.7-0+deb10u1))
Type "help" for help.

mydbname=# \l
                              List of databases
   Name    | Owner | Encoding |   Collate   |    Ctype    | Access privileges
-----------+-------+----------+-------------+-------------+-------------------
 mydbname  | tom   | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 |
 postgres  | tom   | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 |
 template0 | tom   | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 | =c/tom           +
           |       |          |             |             | tom=CTc/tom
 template1 | tom   | UTF8     | en_GB.UTF-8 | en_GB.UTF-8 | =c/tom           +
           |       |          |             |             | tom=CTc/tom
(4 rows)
@jfischoff
Copy link
Owner

It is not a bug but that does not mean it is well designed. It should be at a minimum better documented.

The tension is that there are connection parameters which are generated based on other configuration, such as the port and there is configuration that is "free" like the keepalivesIdle. The connection type I am using does not make this clear.

Database creation is a little more complex than one might at first imagine. initdb must run first to create the initial database files. It creates the postgres, template0 and template1 databases. I don't think you can prevent this and I don't think in general you would want to since extensions and other tools assume these databases exist.

If you want to create another database you have to use CREATE DATABASE or the createdb command. tmp-postgres exposes the createDbConfig config option to configure createdb. By default additional database creation does not occur because it is slow.

optionsToConfig handles creating a Config which makes a database and sets the configuration parameters appropriately. However one might want to extend the configuration to create the database in more complex ways.

I think it is possible that I could look to the dbname in the connection options and generate a new database if does not already exist.

If this was how tmp-postgres worked what you tried initially would have work. However I'm not sure how this would handle the case where one wanted to configure the database in more complex ways which is possible by setting the createDbConfig options.

As it stands with the current approach, I think connectionOptions should not be the standard postgres connection options but some similar that is missing options that are generated from other parts of the configuration.

Long story short, this is not a bug AFAICT but the types allow one to use the API incorrect ways so it is not an ideal situation either.

tomjaguarpaw added a commit that referenced this issue Sep 5, 2020
rather than changing the connectionOptions field of Config.  Partially
addresses

    #267
jfischoff pushed a commit that referenced this issue Sep 5, 2020
…268)

rather than changing the connectionOptions field of Config.  Partially
addresses

    #267

Co-authored-by: Tom Ellis <[email protected]>
@jfischoff
Copy link
Owner

Thanks for the PR to document this better.

For a longer term fix do you have any thoughts? I think I am move towards having a special client options type that is missing the port and dbname options.

@tomjaguarpaw
Copy link
Collaborator Author

Personally I'd like to see a Config type that

  1. makes it clear exactly where each parameter is taken from, and
  2. that has no defaults.

In summary I would describe it as having a "low-level configuration datatype" that makes it clear precisely how tmp-postgres sets up the database, and having the convenient default values/Last/Accum as a sugar layer on top.

Regarding "exactly where each parameter is taken from", consider the example of connectionOptions. The presence of connectionOptions in Config makes it ambiguous where things like the port number, username, database name, password etc. should be specified. Do they come from the connectionOptions? dbname doesn't, as I noticed above! I guess port does not either (although something weird happens if I try to set it in connectionOptions -- it's not completely ignored). But what about user and password? They can also be specified in PG* environment variables and it's not clear which take precedence. I would prefer there to simply be a field where user is set and a field where password is set. Whether tmp-postgres communicates these to Postgres via environment variables or some other means seems to be an implementation detail that shouldn't appear in the surface API.

Regarding "no defaults", consider temporaryDirectory :: Last FilePath.The documentation says that the default is to use /tmp in the case that the underlying Maybe is Nothing (although I also wonder if it prioritises TMP and TMPDIR over /tmp). I find the logic for this kind of thing really hard to follow. Personally I'd prefer that temporaryDirectory :: FilePath, the defaults be set by things like defaultConfig :: Config (or defaultConfigIO :: IO Config if you want to look up TMP and TMPDIR), and the Last/Accum monoid stuff be a layer on top of Config (maybe a separate type called ConfigMonoid) rather than the actually Config itself.

@jfischoff
Copy link
Owner

Personally I'd like to see a Config type that

makes it clear exactly where each parameter is taken from, and
that has no defaults.

This is what the Plan is for https://hackage.haskell.org/package/tmp-postgres-1.34.1.0/docs/Database-Postgres-Temp-Internal-Core.html#t:Plan

However isn't some of the value of tmp-postgres is that it has defaults? It picks a free port and uses a temporary directory for the data. That's definitely part of the value from my perspective. The other bits are about caching and various speed improvements. Otherwise why not just call initdb and postgres yourself?

Regarding "exactly where each parameter is taken from", consider the example of connectionOptions. The presence of connectionOptions in Config makes it ambiguous where things like the port number, username, database name, password etc. should be specified. Do they come from the connectionOptions? dbname doesn't, as I noticed above! I guess port does not either (although something weird happens if I try to set it in connectionOptions -- it's not completely ignored).

But what about user and password? They can also be specified in PG* environment variables and it's not clear which take precedence.

I can see how that is not clear.

I would prefer there to simply be a field where user is set and a field where password is set. Whether tmp-postgres communicates these to Postgres via environment variables or some other means seems to be an implementation detail that shouldn't appear in the surface API.

I would not say tmp-postgres is communicating with Postgres via these environment variables. postgres and initdb need some environment variables to work at all and they will pick up these environment variables as well (although the command line parameters take precedence I believe). I initially attempted a implementation that passed an empty environment to initdb and postgres but this failed. It might have failed merely because the PATH was missing but in either case the idea of filter a users environment variables and passing some to the executables and not others seemed more error prone and potentially confusing. However perhaps in hindsight it is not.

Regarding "no defaults", consider temporaryDirectory :: Last FilePath.The documentation says that the default is to use /tmp in the case that the underlying Maybe is Nothing (although I also wonder if it prioritises TMP and TMPDIR over /tmp). I find the logic for this kind of thing really hard to follow. Personally I'd prefer that temporaryDirectory :: FilePath, the defaults be set by things like defaultConfig :: Config (or defaultConfigIO :: IO Config if you want to look up TMP and TMPDIR), and the Last/Accum monoid stuff be a layer on top of Config (maybe a separate type called ConfigMonoid) rather than the actually Config itself.

It is pretty common for configuration to use a monoid for combination. I suppose another method could be used to combine default values. Without seeing the alternative method of combination it is hard for me to say if it is better or worse than what is there now.

@tomjaguarpaw
Copy link
Collaborator Author

This is what the Plan is for

Thanks, I will have a look.

However isn't some of the value of tmp-postgres is that it has defaults? It picks a free port and uses a temporary directory for the data.

It is pretty common for configuration to use a monoid for combination. I suppose another method could be used to combine default values.

Yes, absolutely, the defaults are very convenient and I think the monoid API makes a great sugary front end. I think there should also be a plain, no sugar intermediate API too though. I'll have to think more about how to describe what I am imagining.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants