Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker compose postgres volume update #10160

Open
kuhlaid opened this issue Dec 1, 2023 · 5 comments
Open

Docker compose postgres volume update #10160

kuhlaid opened this issue Dec 1, 2023 · 5 comments
Labels
Component: Containers Anything related to cloudy Dataverse, shipped in containers. Type: Bug a defect

Comments

@kuhlaid
Copy link
Contributor

kuhlaid commented Dec 1, 2023

Currently the docker-compose-dev.yml file defines the postgres volume as the following:

    volumes:
      - ./docker-dev-volumes/postgresql/data:/var/lib/postgresql/data

https://github.com/IQSS/dataverse/blob/b15b89f794415662ed860178c8f857446477d7b9/docker-compose-dev.yml#L79C5-L80C70

I was not able to get the Docker container for postgres to run with this setting. It was throwing errors about setting permissions for /var/lib/postgresql/data. I'm thinking the better approach would be to use a volume of - ./init.sql:/docker-entrypoint-initdb.d/init.sql or similar since you should not need to worry about permissions. The docker-entrypoint-initdb.d is the default initialization point for postgres in Docker https://hub.docker.com/_/postgres/. This will be useful when creating Docker compose instances for testing things such as an API call to update an existing file, where you want a preconfigured database and data as a starting point.

Anyway, just making this a placeholder for discussion and a pull request.

@kuhlaid kuhlaid added the Type: Bug a defect label Dec 1, 2023
@pdurbin pdurbin added the Component: Containers Anything related to cloudy Dataverse, shipped in containers. label Dec 1, 2023
@pdurbin pdurbin moved this to Containerization (Oliver) in IQSS Dataverse Project Dec 1, 2023
@pdurbin
Copy link
Member

pdurbin commented Dec 8, 2023

@kuhlaid thanks for opening this issue.

In a container meeting the other day we had stubbed out a draft issue called postgres init problem on Windows. Should we delete that draft issue in favor or this one?

Below is a dump of notes that are in that draft issue:

Write docs or refactor "grant privs"?

Look at the chown we already do.

See https://docs.google.com/document/d/1V4T2JA8O36PZEDLvnioaR06o7j5B_68cSrS09bte51Q/edit?usp=sharing

See https://dataverse.zulipchat.com/#narrow/stream/375812-containers/topic/Current.20steps.20required.20for.20windows.20development.2E/near/404981034

https://stackoverflow.com/a/26599273/10027828 regarding using the /docker-entrypoint-initdb.d/ folder.

https://github.com/docker-library/postgres

@kuhlaid
Copy link
Contributor Author

kuhlaid commented Dec 9, 2023

After some additional testing, the init file is not necessary, and neither is postgres volumes definition, which should probably be removed from the compose script since the database is setup by the bootstrap or some other service. At any rate the init.sql that I thought might be needed, is not. So removing the following lines from the postgres service in the compose file should resolve any permissions issues (it did for me).

volumes:
  - ./docker-dev-volumes/postgresql/data:/var/lib/postgresql/data

As a side note, trying to mount SQL scripts to the postgres volume within the compose script (as shown below), where localPostgresDirectoryOfSqlScripts is a local directory of SQL scripts or shell commands you want to run as the postgres service is being build, will NOT work for the Dataverse the way the build process is currently setup. Again the bootstrap or whatever service is creating the database tables seems to clash with any scripts you try to mount to the postgres volume.

volumes:
  - ./localPostgresDirectoryOfSqlScripts:/docker-entrypoint-initdb.d

My next idea is to try and run some postgres SQL scripts after the Dataverse bootstrapping is complete to see if the database can be updated at that point, and what that process might look like.

If someone who knows the database creation process could explain that then it might be helpful. Ideally it would seem more appropriate to handle the database build within the postgres service itself and not use a separate bootstrapping to create the database.

@poikilotherm
Copy link
Contributor

poikilotherm commented Dec 14, 2023

@kuhlaid please note that not having a volume backing the database store is a very bad idea.
This way you will write binary data into the overlay filesystem of Docker, which at least takes a performance hit.

Before we setup our compose file the way we have it now (bind mounts), I suggested using tmpfs to back these things and make the containers more ephemeral. There are valid concerns with that approach, so we took in @GPortas way and created these bind mounts.

We could think about using real volumes instead of bind mounts here. https://docs.docker.com/storage/volumes/ Let's put this on the agenda for a CT group meeting! CC @pdurbin


About initializing the database from scripts: Dataverse is a Jakarta EE application and uses JPA as ORM mapper. In our current setup, we use JPA's feature to initialize the database when not present. I have never been very happy about this, especially not since we introduced Flyway to keep track of schema migrations. We never followed the good practice to create a baseline schema (not just an empty baseline migration) and switch of JPA table generation.

If you would be interested in making Dataverse pick up custom SQL scripts and execute them using Flyway, that would be much appreciated! We have things like createsequence.sql that would be perfect candidates to be not executed outside of Dataverse but either bundled and/or picked up by a discovery mechanism.

@pdurbin
Copy link
Member

pdurbin commented Dec 14, 2023

We could think about using real volumes instead of bind mounts here. https://docs.docker.com/storage/volumes/ Let's put this on the agenda for a CT group meeting! CC @pdurbin

Done! https://docs.google.com/document/d/1xU36giT0_85PlvIoGRWWaacJ2JQ5hyQYmifYmWfyGcQ/edit?usp=sharing (will copy to a future meeting if necessary).

About initializing the database from scripts: Dataverse is a Jakarta EE application and uses JPA as ORM mapper. In our current setup, we use JPA's feature to initialize the database when not present. I have never been very happy about this, especially not since we introduced Flyway to keep track of schema migrations. We never followed the good practice to create a baseline schema (not just an empty baseline migration) and switch of JPA table generation.

Related:

@kuhlaid
Copy link
Contributor Author

kuhlaid commented Dec 14, 2023

Thank you @poikilotherm for describing the JPA and Flyway processes which are new to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Containers Anything related to cloudy Dataverse, shipped in containers. Type: Bug a defect
Projects
Status: No status
Development

No branches or pull requests

3 participants