-
Notifications
You must be signed in to change notification settings - Fork 969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Self hosted runner are going offline suddenly and it is not connected. #3501
Comments
@akhilp6 do you have a session Id for that runner? |
@luketomlinson: I am having the same issue with all of our hosts, and coincidentally it started at approximately the same time as @akhilp6. Any updates on this issue? |
@luketomlinson, how do I find the session ID in order to get this issue moving forward? Our runners have been down for too long and I'd like to help resolve this issue ASAP. |
Hey @ktaggart, typically it would be from the runner logs as a queryParameter. We identified a bug on our side that was incorrectly deleting some runners. We've fixed this issue, but you'll need to re-register to get them back online. Let me know if this still appears to be happening. |
@luketomlinson, thanks for the response. I am pretty new to this and inherited our cluster. Can you point me to the registration docs? |
@luketomlinson Is there a new runner version which we need to use to re-register them? |
I followed the instructions, created all the runners, added our labels, but they now sit idle and do not accept new jobs: I compare the new .runner file to the old one and noted some rather stark differences:
new_runner
When I attempted to use the poolName "group" I received an error indicating that the poolName "group" was not found, and I could only use "Default." Also, the old_runner gitHubUrl is completely invalid - I have no idea how that could have worked previously, but it did. When I tried using that url, I received a "404 not found" error, which made sense. I am wondering what the significance of the poolId and agentId are? Any suggestions on how I can get my runner working/accepting jobs? |
@luketomlinson, some more info. After examining the runner log, I see the following:
I am not sure how the token that github provided is bad, but the token shown above is not the same token shown in the .credentials file. Both the .credentials and .credentials_rsaparams files were created the same day and the same time, but are somehow out of sync. ETA:
All of them were created before the corresponding .credentials file; ranging from minutes to years. I do not understand what is going on here, but I would assume that when I created the new runners that old/invalid credentials would be cleared out. Please comment. |
Hi @ktaggart, Where are you trying to register the runner? Runners can be registered at the repo, org, or enterprise level. That is determined by the During that registration process, you choose a Runner Group (aka Pool) so PoolName == Runner Group Name. Once runners are registered, you'll need to make sure that runner group has access to the repository in question if the runner is registered at the org or enterprise level. Since you are re-registering runners, I would recommend deleting all of those |
@luketomlinson, thanks for the response. I followed the instructions you linked, and used the gitHubUrl that was provided via that process, e.g., In the above example the runner is being registered at the repo level, our internal corp github repo. For poolName, I was only able to use 'Default' as when I tried using our previous group name, I received an error indicating that group name was not available, and I could only use 'Default'. I noted that in a previous message. Q: Can I change that to our old group name manually by editing the .runner files, or will that cause an issue? The runners do have access to the repo, as all the runners appear in the repo runners section and show as idle. and after the config was completed, I received a 'Connected to GitHub' message As far as deleting the .credentials files, before running config.sh, I had to delete the existing .runner file in each runner directory in order to proceed, which resulted in a 'Removed .credentials' message while creating each new runner. After that, a new .credentials file was in place, with some old .credentials_rsaparams. Q: Are you suggesting that I delete the .runner, .credentials and .credentials_rsaparams files and reinstall each runner? Please clarify. Thx. |
Hi @ktaggart, From your previous comment, it appears the old runners were registered at the enterprise level with a different group. To answer your question briefly, changing the .runner file group field will have no effect. It's just a representation of how the runner is registered. If you are still having issues, I'd recommend going through our support channels, rather than a public issue. |
+1 same problem with MacOS 14.7 version 2.319.1 and 2.320.0. Suddenly broken:
|
Describe the bug
We are using self hosted macos machines for our Github actions. It was working fine till yesterday but its suddenly getting disconnected and going to offline. And when I check the runner logs in
_diag
, I did find thisWhat's not working?
I have been seeing this issue from couple of days and its happening for all our hosts.
I did see some online thread, some cases were clock of the server was off( which is not true in our case)
Runner Version and Platform
Runner version we are using is the latest
2.319.1
OS of the machine running the runner? OSX
Runner and Worker's Diagnostic Logs
Added the runner logs above.
The text was updated successfully, but these errors were encountered: