Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group pods under aerogear-metrics application and use deploy instead of deploymentConfig #34

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

StevenTobin
Copy link
Contributor

@StevenTobin StevenTobin commented Mar 2, 2018

This PR groups all the metrics pods under the application label aerogear-metrics and switches to the use of k8s deploy resources instead of Openshift deploymentConfigs.

Related issues: #30 & #31

@StevenTobin StevenTobin changed the title group pods under aerogear-metrics application group pods under aerogear-metrics application and use deploy instead of deploymentCong Mar 5, 2018
@StevenTobin StevenTobin changed the title group pods under aerogear-metrics application and use deploy instead of deploymentCong group pods under aerogear-metrics application and use deploy instead of deploymentConfig Mar 5, 2018
@StevenTobin StevenTobin changed the title group pods under aerogear-metrics application and use deploy instead of deploymentConfig Group pods under aerogear-metrics application and use deploy instead of deploymentConfig Mar 5, 2018
@david-martin
Copy link
Contributor

@matzew would appreciated your review on this given you're also working on https://issues.jboss.org/browse/AEROGEAR-2012

@matzew
Copy link
Member

matzew commented Mar 5, 2018

👀 at it now

@matzew
Copy link
Member

matzew commented Mar 5, 2018

While I see that the deployment part works, I am unable to get the entire APB up and running.

Here is the change from dc to deployment, I could successfully verify that:

➜  metrics-apb git:(3d6ef5e) oc get dc
No resources found.
➜  metrics-apb git:(3d6ef5e) oc get deployment
NAME                   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
aerogear-app-metrics   1         1         1            1           5m
grafana                1         1         1            1           6m
postgres               1         1         1            1           5m
prometheus             1         1         1            1           7m

However, the apb fails here

See the log:


FAILED - RETRYING: Wait for app-metrics health endpoint to report healthy (4 retries left).
--
  | FAILED - RETRYING: Wait for app-metrics health endpoint to report healthy (3 retries left).
  | FAILED - RETRYING: Wait for app-metrics health endpoint to report healthy (2 retries left).
  | FAILED - RETRYING: Wait for app-metrics health endpoint to report healthy (1 retries left).
  | fatal: [localhost]: FAILED! => {"attempts": 30, "changed": false, "content": "", "msg": "Status code was not [200]: Request failed: <urlopen error [Errno 113] No route to host>", "redirected": false, "status": -1, "url": "http://aerogear-app-metrics-foo.192.168.37.1.nip.io/healthz "}
  | to retry, use: --limit @/opt/apb/actions/provision.retry
  | PLAY RECAP *********************************************************************
  | localhost                  : ok=38   changed=35   unreachable=0    failed=1
  | + EXIT_CODE=2
  | + set +ex
  | + '[' -f /var/tmp/test-result ']'
  | + exit 2


This brings me to the question, why we are still using openshift_v1_route entries? We can use k8s_v1_endpoints instead, right ?

@StevenTobin
Copy link
Contributor Author

StevenTobin commented Mar 5, 2018

That check in the APB for the pods to be ready is heavily dependent on your system and how quick the pods come up. If it takes a while for your system to pull the image the check could time out or if it's under heavy load the check could timeout, but when you check the project in Openshift the pod could be fine seconds after the APB provision has 'failed'.

I've been investigating the CI failures on this APB and it usually seems to be these container checks timing out, I also can't find checks like these in any other APBs and my instinct is to remove them. Without them the APB won't wait for each container to become ready so the whole APB should provision faster, so this is what i'd like to do:

  • Remove the container ready checks for grafana and metrics-api server as the functionality can be replaced with readinessProbes.
  • The metrics-api server already has a readinessProbe that hits the healthz endpoint.
  • Grafana has an /api/health endpoint so add a readinessProbe that hits that endpoint.

I can't find a similar health endpoint for prometheus however.

@philbrookes @david-martin @matzew does this seem reasonable.

I'll create an issue as it's not really in the scope of this PR.

@david-martin
Copy link
Contributor

@StevenTobin Readiness probes for health endpoints sounds reasonable.
Checks at the end of the APB provision task to ensure every service is running OK (has at least 1 pod in a ready state) would ensure the APB only finishes if everything is running OK.

@matzew
Copy link
Member

matzew commented Mar 6, 2018

@StevenTobin I also see that the deprovision still speaks about deploymentconfig.

Perhaps we wanna use something like this:
https://github.com/ansibleplaybookbundle/mediawiki-apb/blob/master/roles/mediawiki/tasks/deprovision.yml#L22-L28

@matzew
Copy link
Member

matzew commented Mar 6, 2018

I'll create an issue as it's not really in the scope of this PR.

+1 on moving this larger discussion to a new issue/thread

Copy link
Contributor

@david-martin david-martin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All groups into single application.
image

All Deployments as expected (no DeploymentConfigs)

oc get deployment
NAME                   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
aerogear-app-metrics   1         1         1            1           2m
grafana                1         1         1            1           3m
postgres               1         1         1            1           2m
prometheus             1         1         1            1           4m
$ oc get dc
No resources found.

All running.

Approved

@secondsun
Copy link

I don't know if this is related, but after deploying metrics with this PR I don't see it in my list of services after running mobile get clientconfig myapp-android --namespace=myproject -o json

@david-martin
Copy link
Contributor

@secondsun I think this will address it disappearing from the cli
https://issues.jboss.org/browse/AEROGEAR-2267 (cc @pb82)

@pb82
Copy link
Contributor

pb82 commented Mar 8, 2018

@secondsun did this work for you before? Because i think it shouldn't work work.

@grdryn
Copy link
Contributor

grdryn commented Dec 4, 2018

Bump!

I just noticed that DeploymentConfigs were being used, and was considering changing to Deployments. Thankfully I noticed this PR first!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants