Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommended pattern for database creation #14

Closed
enragedginger opened this issue Dec 8, 2016 · 12 comments
Closed

Recommended pattern for database creation #14

enragedginger opened this issue Dec 8, 2016 · 12 comments

Comments

@enragedginger
Copy link

Hello,

Thanks for putting this project together. What's the recommended pattern for database creation for local development with Citus and Docker?

I have a SQL script that creates my database tables--some of which are Citus distributed tables. I'd like to run a command using docker compose to start up a master and two workers, and run the SQL create script. What's the recommended approach for doing this?

Right now, I've updated the docker compose script to look something like this:

version: '2'

services:
  master:
    container_name: 'citus_master'
    image: 'my_custom_image:latest'
    ports: ['5432:5432']
    labels: ['com.citusdata.role=Master']
    volumes: ['/var/run/postgresql']
    environment:
    - POSTGRES_USER=user
    - POSTGRES_PASSWORD=pass
    - POSTGRES_DB=db
    depends_on:
    - worker1
    - worker2
  worker1:
    image: 'citusdata/citus:6.0.1'
    labels: ['com.citusdata.role=Worker']
    environment:
    - POSTGRES_USER=user
    - POSTGRES_PASSWORD=pass
    - POSTGRES_DB=db
  worker2:
    image: 'citusdata/citus:6.0.1'
    labels: ['com.citusdata.role=Worker']
    environment:
    - POSTGRES_USER=user
    - POSTGRES_PASSWORD=pass
    - POSTGRES_DB=db
  config:
    container_name: 'citus_config'
    image: 'citusdata/workerlist-gen:2.0.0'
    volumes: ['/var/run/docker.sock:/tmp/docker.sock']
    volumes_from: ['master']

The Dockerfile for my custom Citus image looks something like this:

FROM citusdata/citus:6.0.1

MAINTAINER me

COPY create_schema.sql /docker-entrypoint-initdb.d/002_create_my_schema_please.sql

ENV POSTGRES_USER user
ENV POSTGRES_PASSWORD pass
ENV POSTGRES_DB db

During startup, the master is unable to connect to the workers while attempting to run the create_distributed_table command. The logs on the master read something like this:

WARNING:  connection failed to docker_worker1_1:5432
DETAIL:  fe_sendauth: no password supplied
WARNING:  connection failed to docker_worker1_1:5432
DETAIL:  fe_sendauth: no password supplied
WARNING:  could not create shard on "docker_worker1_1:5432"
WARNING:  could not create shard on "docker_worker1_1:5432"
WARNING:  connection failed to docker_worker2_1:5432
DETAIL:  fe_sendauth: no password supplied
WARNING:  could not create shard on "docker_worker2_1:5432"
ERROR:  could only create 0 of 2 of required shard replicas
STATEMENT:  SELECT create_distributed_table('my_table', 'my_distributing_id_column');
WARNING:  connection failed to docker_worker2_1:5432
DETAIL:  fe_sendauth: no password supplied
WARNING:  could not create shard on "docker_worker2_1:5432"
ERROR:  could only create 0 of 2 of required shard replicas

Am I missing something here or is my approach entirely wrong?

@enragedginger
Copy link
Author

Also, it might be worth noting that I'm running Docker native Version 1.13.0-rc3-beta32 (14523) on Mac OS X

@enragedginger
Copy link
Author

For anyone who comes across this, I did following to get me by for now:

  1. Stopped using a custom POSTGRES_USER, POSTGRES_PASSWORD, and POSTGRES_DB. The Citus containers don't appear to support such things when running in distributed mode. That is, when the master tries to connect to the workers, it doesn't seem to use these values.
  2. Stopped running my custom DB create / seed script when the container is built. Instead, I created a separate bash script for creating and seeding the DB after all containers are up and running.
  3. Even with these changes, the workers still don't always connect to the master. I'm not entirely sure why this happens.

tl;dr Only use these images for local development. Don't use them in production.

@enragedginger enragedginger changed the title Recommended pattern for database creation (Avoid using this setup in production) Recommended pattern for database creation Jan 23, 2017
@jasonmp85
Copy link
Contributor

Hey, this ticket slipped through the cracks of our support in December… I'm going to give it a look later today. I may have some answers around POSTGRES_USER, etc. variables and how to get the master to use them when connecting to workers (our Cloud team probably has something to add about this).

As far as the seeding goes; I do think it'll be safer to populate during startup.

For "the workers don't always connect to the master"… is this what you meant? The primary connections go from master to worker. There are scenarios wherein the workers connect to master, but they're triggered less often. Can you provide repro steps?

@enragedginger
Copy link
Author

enragedginger commented Jan 23, 2017

@jasonmp85 Thanks for looking into this.

I haven't been able to find deterministic steps to reproduce the problem so I'm thinking it's some kind of race condition. I have a docker compose file that's not entirely dissimilar to the example one. I have references to a couple of other containers I'm using (Zookeeper, Kafka).

I'm using docker-compose up-d && docker-compose scale worker=2 kafka=3 to start up the containers and scale the Citus workers and Kafka nodes. The first worker appears to always start up and register just fine. However, the second worker doesn't always appear to register as when I run SELECT master_get_active_worker_nodes(); only occasionally has the second worker (but always has the first).

In the situations where the second worker doesn't register, the logs on the Cittus config node look something like this:

2017/01/23 17:33:55 Generated '/etc/citus/pg_worker_list.conf' from 5 containers
2017/01/23 17:33:55 Running 'psql -h/var/run/postgresql -Upostgres -c 'SELECT master_initialize_node_metadata();''
2017/01/23 17:33:55 Error running notify command: psql -h/var/run/postgresql -Upostgres -c 'SELECT master_initialize_node_metadata();', exit status 1
2017/01/23 17:33:55 Watching docker events
2017/01/23 17:33:56 Contents of /etc/citus/pg_worker_list.conf did not change. Skipping notification 'psql -h/var/run/postgresql -Upostgres -c 'SELECT master_initialize_node_metadata();''
2017/01/23 17:33:56 Received event start for container 05aa38dbbcac
2017/01/23 17:33:56 Received event die for container 7aeb4bed4a00
2017/01/23 17:33:57 Contents of /etc/citus/pg_worker_list.conf did not change. Skipping notification 'psql -h/var/run/postgresql -Upostgres -c 'SELECT master_initialize_node_metadata();''
2017/01/23 17:33:57 Contents of /etc/citus/pg_worker_list.conf did not change. Skipping notification 'psql -h/var/run/postgresql -Upostgres -c 'SELECT master_initialize_node_metadata();''
2017/01/23 17:33:58 Received event start for container 00fd92088946
2017/01/23 17:33:59 Generated '/etc/citus/pg_worker_list.conf' from 6 containers
2017/01/23 17:33:59 Running 'psql -h/var/run/postgresql -Upostgres -c 'SELECT master_initialize_node_metadata();''
2017/01/23 17:33:59 Error running notify command: psql -h/var/run/postgresql -Upostgres -c 'SELECT master_initialize_node_metadata();', exit status 2
2017/01/23 17:34:00 Received event start for container 266348e51a80
2017/01/23 17:34:00 Received event start for container ee2cdadc87a0
2017/01/23 17:34:00 Contents of /etc/citus/pg_worker_list.conf did not change. Skipping notification 'psql -h/var/run/postgresql -Upostgres -c 'SELECT master_initialize_node_metadata();''
2017/01/23 17:34:00 Contents of /etc/citus/pg_worker_list.conf did not change. Skipping notification 'psql -h/var/run/postgresql -Upostgres -c 'SELECT master_initialize_node_metadata();''

Manually running select master_initialize_node_metadata(); appears to resolve the issue.

@enragedginger enragedginger changed the title (Avoid using this setup in production) Recommended pattern for database creation Recommended pattern for database creation Jan 23, 2017
@jasonmp85
Copy link
Contributor

It's been a while since I've looked over the issues in this repo. Were there any further questions? We're doing some work on our own docker-compose node membership implementation; any concerns you have would be nice to know and address in that work!

@enragedginger
Copy link
Author

Not at this time. I think we can close this. Thank you!

@roynasser
Copy link

Hi @enragedginger & @jasonmp85 ,

I seem to be running into problems with docker-compose and POSTGRES_USER/PASSWORD variables too...

Was this ever looked into and fixed or was it just closed due to lack of need/interest?

@enragedginger
Copy link
Author

Hi @RVN-BR, it's been awhile, but I believe I closed it due to lack of interest.

@marchelbling
Copy link

As I've also struggled to make this work I've written an article and created a gist that might also be helpful to you. Let me know if it works for you or if you have any feedback.

@mubaidr
Copy link

mubaidr commented Apr 8, 2020

@enragedginger

  1. Stopped using a custom POSTGRES_USER, POSTGRES_PASSWORD, and POSTGRES_DB. The Citus containers don't appear to support such things when running in distributed mode. That is, when the master tries to connect to the workers, it doesn't seem to use these values.

See this: https://github.com/citusdata/docker/blob/master/Dockerfile#L35

2. Stopped running my custom DB create / seed script when the container is built. Instead, I created a separate bash script for creating and seeding the DB after all containers are up and running.

I am having the same issue. But manually running the script seems counter intuitive and I believe somewhat diminish the use of docker like tools. There should be a proper way or event to notify that node initialization is complete. and then custom scripts should be run. @jasonmp85 comments please

3. Even with these changes, the workers still don't always connect to the master. I'm not entirely sure why this happens.

See this: #124 (comment)

@hanefi
Copy link
Member

hanefi commented Apr 13, 2020

There should be a proper way or event to notify that node initialization is complete.

@mubaidr I agree. However it was not so trivial, and slightly hard to figure out readiness reporting and detection in the docker service dependencies. I created 2 seperate PRs that aim to address this problem. Feel free to comment on these:
#187
citusdata/membership-manager#9

@mubaidr
Copy link

mubaidr commented Apr 13, 2020

Thanks and yes, I too ended up using pg_healthcheck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants