Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image update automation not committing the resolved version to git #159

Open
sanjvij opened this issue Apr 15, 2021 · 30 comments
Open

Image update automation not committing the resolved version to git #159

sanjvij opened this issue Apr 15, 2021 · 30 comments
Assignees
Labels
area/docs Documentation related issues and PRs bug Something isn't working

Comments

@sanjvij
Copy link

sanjvij commented Apr 15, 2021

Hi team,

Thanks for your help so far. I am stuck at implementing a use case where by the image update automation policy is not applying the changes to the git.

When I run the command as below, I can see that flux was able to detect a new version in the registry but never committed the same to git.
(base) sanj@Sanjs-Air app-cluster % flux get image policy staging
NAME READY MESSAGE LATEST IMAGE
staging True Latest image tag for 'sanjvij01/getting-started' resolved to: v3.0.2 sanjvij01/getting-started:v3.0.2

(base) sanj@Sanjs-Air app-cluster % kubectl get deployment/getting-started-image -n staging -oyaml | grep 'image'
name: getting-started-image
selfLink: /apis/apps/v1/namespaces/staging/deployments/getting-started-image
- image: sanjvij01/getting-started:v3.0.1
imagePullPolicy: IfNotPresent
message: ReplicaSet "getting-started-image-554964548d" has successfully progressed.

my image update automation file looks like below. I have a feeling I have done something in which case feel free to point.

apiVersion: image.toolkit.fluxcd.io/v1alpha1
kind: ImageUpdateAutomation
metadata:
name: flux-system
namespace: flux-system
spec:
checkout:
branch: master
gitRepositoryRef:
name: flux-staging
commit:
authorEmail: [email protected]
authorName: sanjvij
messageTemplate: '{{range .Updated.Images}}{{println .}}{{end}}'
interval: 1m0s
push:
branch: master
update:
path: ./
strategy: Setters

Let me know if you need me to provide any further info.

@rayterrill
Copy link

rayterrill commented Apr 24, 2021

I'm running into a similar issue. Is there a way to debug the image-automation-controller to get more info on why we're getting "no updates made"?

I've got two clusters:

  • 1.19 in EKS, looks like it's running v0.11.0 (PROD)
  • 1.20 as a K3s cluster, running v0.12.3. (TEST)

I may try to downgrade my TEST cluster to v0.11.0 to see if that lets things work.

@stefanprodan stefanprodan transferred this issue from fluxcd/flux2 Apr 26, 2021
@rayterrill
Copy link

Downgraded cluster to v0.11.0, still not able to get image-automation-controller working correctly, and not sure where to start looking to understand why it isn't making an update.

@rayterrill
Copy link

Update - Dug into the manifests for the flux components - looks like we can put the image-automation-controller in debug mode:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/instance: flux-system
    app.kubernetes.io/version: v0.11.0
    control-plane: controller
  name: image-automation-controller
  namespace: flux-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: image-automation-controller
  template:
    metadata:
      annotations:
        prometheus.io/port: "8080"
        prometheus.io/scrape: "true"
      labels:
        app: image-automation-controller
    spec:
      containers:
      - args:
        - --events-addr=http://notification-controller/
        - --watch-all-namespaces=true
        - --log-level=debug
        - --log-encoding=json
        - --enable-leader-election

@rayterrill
Copy link

Getting:

no changes made in working directory; no commit

@rayterrill
Copy link

Played around with the yaml for my definition - maybe something was goofed up with the imagepolicy comment? I eventually got it working but it would really be nice to have some additional mechanisms to figure out why this wasn't working.

@bobrossthepainter
Copy link

bobrossthepainter commented May 4, 2021

Hi @rayterrill,
I just came across a similar issue today. Could it be the case that this issue is connected to a directory name, where the yaml file containing the imagePolicy reference is located?
E.g. for me the update works when the imagePolicy reference is located in:
deploy/overlays/int/trololo/patch.yaml
on the other hand it is not working if its located in here:
deploy/overlays/int/user-ui-config-service/patch.yaml

BR
Robert

@bobrossthepainter
Copy link

I just found out, that the problem is indeed related to the directory where the file holding the imagePolicy ref is located.
The alphabetically ascending last directory in a parent directory is ignored by the imagePolicy resolving algorithm.

My fix was adding another dir zzz with a .gitkeep file inside and it magically worked. :D

@rayterrill
Copy link

Potentially, yes. I did end up moving my stuff around in directories - so maybe that helped resolve it.

Really wish there was a way for DEBUG mode to give enough detail to determine why it's not working so we can self-resolve these kinds of issues.

@squaremo
Copy link
Member

squaremo commented Jun 2, 2021

@rayterrill What sort of debug output would have helped you?

@rayterrill
Copy link

Ideally some way to understand why it didn't work - something like "nothing to update" or even better something like "image would be updated but setters didn't match anything" - something to indicate the reconciliation loop found work to do but something in the config prevented it from being written. I believe my problem was indeed that my folder structure and setters definition were not aligned (backed into this by moving things around until I discovered that was the problem).

@melodiez14
Copy link

Please correct me if I'm wrong.

The GitRepository, ImageRepository, ImagePolicy, and ImageUpdateAutomation must be in the same namespace. Then you need to add the image policy marker. eg {"$imagepolicy": "<policy-namespace>:<policy-name>"}. Meaning when you have multiple namespace, the solutions are

  1. Create GitRepository, ImageRepository, ImagePolicy, ImageUpdateAutomation in flux-system namespace. Then the image policy marker points to flux-system:<policy_name>
  2. Create GitRepository in every namespace that want to have ImageUpdateAutomation.

I don't know if it is expected by the design or not. I already tested the first approach and it works well. This is the fact that i found.

  • ImageUpdateAutomation can only refer to GitRepository in same namespace (1) (2)
  • ImageUpdateAutomation only get ImagePolicy in same namespace (1)

@squaremo
Copy link
Member

Ideally some way to understand why it didn't work - something like "nothing to update" or even better something like "image would be updated but setters didn't match anything" - something to indicate the reconciliation loop found work to do but something in the config prevented it from being written.

The controller argument --log-level=debug now results in lots of tracing output: #190. (This will be moved to --log-level=trace at some point soon).

@kingdonb kingdonb self-assigned this Aug 19, 2021
@kingdonb kingdonb added bug Something isn't working area/docs Documentation related issues and PRs labels Aug 20, 2021
@kingdonb
Copy link
Member

I started to discuss this in #180 - linking these threads so they are more discoverable and can perhaps be closed together, with a docs improvement.

@narenderramireddy
Copy link

narenderramireddy commented Dec 2, 2021

Hi @rayterrill, I just came across a similar issue today. Could it be the case that this issue is connected to a directory name, where the yaml file containing the imagePolicy reference is located? E.g. for me the update works when the imagePolicy reference is located in: deploy/overlays/int/trololo/patch.yaml on the other hand it is not working if its located in here: deploy/overlays/int/user-ui-config-service/patch.yaml

BR Robert

That was the problem for me.. It worked moving the files into the right directory

@raress96
Copy link

raress96 commented Dec 8, 2021

Hello, so what directory do you recommend to put these files in?
I have them like clusters/eks/apps/app_name/app_name-registry.yaml
It doesn't seem to work after a while.

However if I install the image automation controller again using:
flux install --components-extra=image-reflector-controller,image-automation-controller

It starts working right away, but again stops after a while, so I am not sure it is solely a directory issue.

@kingdonb
Copy link
Member

kingdonb commented Jan 6, 2022

This most recent comment may be describing the same issue as

These may be duplicate issues, or you may be reporting the other issue... we are investigating it from there, if it's the same.

We've heard reports from a number of folks that image automation stops working after a while, and the curative action suggested that seems to be resolving it is a restart of image-automation-controller. That would likely be accomplished by reinstalling with flux install --components-extra... as you mentioned @raress96 – are you still experiencing this?

@raress96
Copy link

raress96 commented Jan 7, 2022

@kingdonb Hey, it seems to work for now, but we didn't have many images built lately because of the holidays. Not sure if it's going to stop working after a while. Will also follow the other issue for updates.

@kingdonb
Copy link
Member

kingdonb commented Jan 7, 2022

@raress96 #209 and #282 are probably better issues to follow. There are a lot of reports of this issue and it has been tricky from what I understand to reproduce reliably. It appears to happen when there is a connectivity or availability issue with GitHub (and then the issue persists until the controller restarts, from what I've heard based on the reports we got.)

@raress96
Copy link

raress96 commented Jan 7, 2022

@kingdonb It reproduced again for me.

So what I did is that I was setting up a new app with a new ImageRepository and a deployment in which I had for the image filed the following urlwhatever...:1 # {"$imagepolicy": "flux-system:new-app"}.
The version 1 of that image didn't actually exist, so I pushed this an no app was created. After an image was pushed, it had the tag/version 3, and the image in the deployment was successfully updated to urlwhatever...:3 # {"$imagepolicy": "flux-system:new-app"}.

But then version 4 of the app was pushed to the image repository, and the image was not updated, and I had to run flux install --components-extra=image-reflector-controller,image-automation-controller, after which the image was again updated.

Pretty weird. I will also follow those other issues and maybe my feedback helps you debug this.

@raress96
Copy link

raress96 commented Jan 12, 2022

Hey, forgot to mention one thing that might be important: if I have version 3 in the k8s files, and I manually change the version to 4 in a Deployment, the image automation controller puts back version 3, so it seems to still be running but not fetching the correct version maybe.

@pratikbin
Copy link

pratikbin commented May 6, 2022

Facing same issue. Using flux stack 0.30.2 on EKS v1.21.5-eks-bc4871b,

on describing imageupdateautomation object, no updates made whereas imagepolicy and imagerepository is working fine. [image at the end]

all image* objects are in same namespace

$ flux get images all --all-namespaces
NAMESPACE      	NAME                       	LAST SCAN                	SUSPENDED	READY	MESSAGE                       
dev-xxxx	imagerepository/xxxx	2022-05-07T03:20:26+05:30	False    	True 	successful scan, found 8 tags	

NAMESPACE      	NAME                       	LATEST IMAGE                                             	READY	MESSAGE                                                                                       
dev-xxxx	imagepolicy/xxxx-dev	docker.io/xxxx/xxxx:edge-561db2e-1651841847	True 	Latest image tag for 'docker.io/xxxx/xxxx' resolved to: edge-561db2e-1651841847	

NAMESPACE  	NAME                           	LAST RUN                 	SUSPENDED	READY	MESSAGE         
flux-system	imageupdateautomation/k8s-infra	2022-05-07T03:20:22+05:30	False    	True 	no updates made	

image

@kallaics
Copy link

Same issue here:

FluxCD version: 0.30.2

Output of flux check

► checking prerequisites
✔ Kubernetes 1.20.12+vmware.1 >=1.20.6-0
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.18.2
✔ image-automation-controller: deployment ready
► ghcr.io/fluxcd/image-automation-controller:v0.21.3
✔ image-reflector-controller: deployment ready
► ghcr.io/fluxcd/image-reflector-controller:v0.17.1
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v0.22.3
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v0.23.2
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v0.22.5
✔ all checks passed

Output of flux version

flux: v0.30.2
helm-controller: v0.18.2
image-automation-controller: v0.21.3
image-reflector-controller: v0.17.1
kustomize-controller: v0.22.3
notification-controller: v0.23.2
source-controller: v0.22.5

ImageUpdateAutomation not working, the image-automation-controller's logs says:

{
   "level":"error",
   "ts":"2022-05-17T08:57:04.660Z",
   "logger":"controller.imageupdateautomation",
   "msg":"Reconciler error",
   "reconciler group":"image.toolkit.fluxcd.io",
   "reconciler kind":"ImageUpdateAutomation",
   "name":"redacted",
   "namespace":"flux-system",
   "error":"unable to clone 'ssh://git@redacted/redacted/redacted': SSH could not read data: Error waiting on socket"
}

It looks like cannot write back the changes.
Thanks for help.

@pjbgf
Copy link
Member

pjbgf commented May 17, 2022

@kallaics have you tried enabling managed transport yet? This is something we added recently that focuses on improves Git connections.

You just need to get your controller pod to have the environment variable EXPERIMENTAL_GIT_TRANSPORT=true, and that should suffice to enable it. From the next release this will no longer be required as it will be enabled by default.

More information:
https://fluxcd.io/docs/components/source/gitrepositories/#experimental-managed-transport-for-libgit2-git-implementation

@kallaics
Copy link

kallaics commented May 17, 2022

My question removed.

I will write a feedback, when the new images are coming.

@kallaics
Copy link

kallaics commented May 17, 2022

The first update caused an issue:

{ 
  "level":"error",
  "ts":"2022-05-17T12:28:19.441Z",
  "logger":"controller.imageupdateautomation",
  "msg":"Reconciler error",
  "reconciler group":"image.toolkit.fluxcd.io",
  "reconciler kind":"ImageUpdateAutomation",
  "name":"redacted",
  "namespace":"flux-system",
  "error":"unable to clone 'ssh://git@redacted/redacted/redacted': ssh: unexpected packet in response to channel open: <nil>"
}

Additional information the URL doesn't contains .git at the end of the URL and the endpoint is Gitlab.
@pjbgf Do you have any idea maybe? Thanks for your time and help.

@pjbgf
Copy link
Member

pjbgf commented May 20, 2022

@kallaics I think the problem you are experiencing is slightly different than the one reported on this thread. So I created a new issue for it: #365

@nagarciah
Copy link

nagarciah commented May 30, 2023

My failed deployment had exactly the same symptoms ("no changes made in working directory; no commit", new image detected but PR to deploy/update image tag was not created). The reason was: I use to work on Linux boxes but this time I was working on a $§"&%$ Windows machine (the %@§ "Company Policy") and the yml file was encoded as UTF-16 by the so called "text editor" making git diff identify the yml file as a binary file (which obviously it wasn't) and ignoring changes to the file. Maybe FluxCD uses git diff to detect changes (IDK) and it was ignoring the changes to such yml.
Changing the encoding to UTF-8 did the trick. Just in case, also check that the file uses LF instead of CRLF

@enricozammitlon
Copy link

Just in case it helps anyone else, in my case it was because the ImageUpdateAutomation had the wrong update path - duh! Simple one but easy to gloss over since there are no errors per se

@ale900522
Copy link

Please correct me if I'm wrong.

The GitRepository, ImageRepository, ImagePolicy, and ImageUpdateAutomation must be in the same namespace. Then you need to add the image policy marker. eg {"$imagepolicy": "<policy-namespace>:<policy-name>"}. Meaning when you have multiple namespace, the solutions are

  1. Create GitRepository, ImageRepository, ImagePolicy, ImageUpdateAutomation in flux-system namespace. Then the image policy marker points to flux-system:<policy_name>
  2. Create GitRepository in every namespace that want to have ImageUpdateAutomation.

I don't know if it is expected by the design or not. I already tested the first approach and it works well. This is the fact that i found.

  • ImageUpdateAutomation can only refer to GitRepository in same namespace (1) (2)
  • ImageUpdateAutomation only get ImagePolicy in same namespace (1)

@ale900522
Copy link

Please check this issue I have a problem with image update automation
#621 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docs Documentation related issues and PRs bug Something isn't working
Projects
None yet
Development

No branches or pull requests