-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log install preflight errors/warnings in both the CLI and kotsadm pod #4046
Conversation
@@ -47,6 +46,10 @@ import ( | |||
"k8s.io/client-go/kubernetes" | |||
) | |||
|
|||
var client = &http.Client{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, there was no timeout. Just being defensive here.
@@ -121,7 +126,7 @@ func execute(appID string, sequence int64, preflightSpec *troubleshootv1beta2.Pr | |||
KubernetesRestConfig: restConfig, | |||
} | |||
|
|||
logger.Debug("preflight collect phase") | |||
logger.Info("preflight collect phase") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this to info cause they show progress when performing an installation, which is critical functionality IMO to always log
pkg/preflight/preflight.go
Outdated
// Log the preflight results if there are any warnings or errors | ||
// The app may not get installed so we need to see this info for debugging | ||
if GetPreflightState(uploadPreflightResults) != "pass" { | ||
// TODO: Are there conditions when the application gets installed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so here since we should only deploy if the preflight state is "pass"
https://github.com/replicatedhq/kots/blob/main/pkg/preflight/preflight.go#L236-L245
cmd/kots/cli/install.go
Outdated
@@ -444,11 +448,22 @@ func InstallCmd() *cobra.Command { | |||
switch status { | |||
case storetypes.VersionPendingPreflight: | |||
log.ActionWithSpinner("Waiting for preflight checks to complete") | |||
log.FinishSpinner() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave the spinner going while we wait for the preflights and finish with success or error afterwards? Otherwise we end up with a state like this while preflights are running and a user may think things are hanging or stuck:
• Validating registry information ✓
• Deploying Admin Console
• Creating namespace ✓
• Waiting for datastore to be ready ✓
• Waiting for Admin Console to be ready ✓
• Waiting for installation to complete ✓
• Waiting for preflight checks to complete ✓
cmd/kots/cli/install.go
Outdated
if errors.As(err, &perr) { | ||
cmd.SilenceErrors = true | ||
var s strings.Builder | ||
s.WriteString("\nPreflight check results (state - title - message)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a pkg/print
where have logic for pretty-printing various things. I think pkg/print/config.go
could be a good reference since those print config validation errors, which might be similar to something like preflight warnings/errors. This logic could live somewhere in that package.
return errors.Wrap(err, "failed to validate preflight results") | ||
log.Errorf("Preflight checks contain warnings or errors. The application was not deployed") | ||
perr := preflightError{} | ||
if errors.As(err, &perr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any benefit to handling specific error types this way vs. how do it elsewhere in the codebase? For example: https://github.com/replicatedhq/kots/blob/main/pkg/handlers/snapshots.go#L202
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any benefit to handling specific error types this way vs. how do it elsewhere in the codebase? For example: https://github.com/replicatedhq/kots/blob/main/pkg/handlers/snapshots.go#L202
They both achieve the same result I think. errors.Cause()
is from github.com/pkg/errors
library while errors.As()
is part of go standard library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if they achieve the same thing, let's just stick to errors.Cause()
for consistency?
pkg/preflight/preflight.go
Outdated
@@ -194,7 +208,7 @@ func Run(appID string, appSlug string, sequence int64, isAirgap bool, archiveDir | |||
// preflight reporting | |||
if isDeployed { | |||
if err := reporting.WaitAndReportPreflightChecks(appID, sequence, false, false); err != nil { | |||
logger.Debugf("failed to send preflights data to replicated app: %v", err) | |||
logger.Errorf("failed to send preflights data to replicated app: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure this message is always accurate anymore given the work to support airgap reporting. since we're exposing it as a error
log with this change, how about we change the message to something more generic like:
failed to report preflight check results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure this message is always accurate anymore given the work to support airgap reporting. since we're exposing it as a
error
log with this change, how about we change the message to something more generic like:failed to report preflight check results
I've seen a number of these debug logs all over the code. I think I'll revert this change for now and leave it as is. Later on we can align the logs.
@@ -426,6 +429,7 @@ func InstallCmd() *cobra.Command { | |||
case err := <-errChan: | |||
if err != nil { | |||
log.Error(err) | |||
// TODO: Why is this a negative exit code? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question... I think this end up resulting in a 255 exit code:
$ echo $?
255
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's fix this and set it to 255
cmd/kots/cli/install.go
Outdated
log.Errorf("Preflight checks contain warnings or errors. The application was not deployed") | ||
perr := preflightError{} | ||
if errors.As(err, &perr) { | ||
cmd.SilenceErrors = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to silence errors if we're always going to call os.Exit(1)
at the end of this code path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to silence errors if we're always going to call
os.Exit(1)
at the end of this code path?
Perhaps not. Removing it
…s into em/log-preflight-errors
} | ||
log.FinishSpinner() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should leave this as we need to stop the spinner in the case where preflights pass too. Otherwise we are left with an indefinite spinner:
• Deploying Admin Console
• Creating namespace ✓
• Waiting for datastore to be ready ✓
• Waiting for Admin Console to be ready ✓
• Waiting for installation to complete ✓
• Waiting for preflight checks to complete ⠇
• Press Ctrl+C to exit
• Go to http://localhost:8800 to access the Admin Console
• Waiting for preflight checks to complete ⠏
Co-authored-by: Craig O'Donnell <[email protected]>
pkg/preflight/preflight.go
Outdated
@@ -216,6 +229,21 @@ func Run(appID string, appSlug string, sequence int64, isAirgap bool, archiveDir | |||
return nil | |||
} | |||
|
|||
func preflightState(p troubleshootpreflight.UploadPreflightResult) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use the existing GetPreflightState
function`?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The state from GetPreflightState
is of the entire list of preflights i.e if there is an error in any, the state == error
. preflightState
gets the state of each preflight for logging purposes.
pkg/print/config.go
Outdated
log.Info(s.String()) | ||
} | ||
|
||
func preflightState(r *preflight.UploadPreflightResult) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll DRY this out in favour of preflights.GetPreflightCheckState
which I'll rename from the existing preflights.preflightState
pkg/print/config.go
Outdated
@@ -26,3 +27,26 @@ func ConfigValidationErrors(log *logger.CLILogger, groupValidationErrors []confi | |||
log.FinishSpinnerWithError() | |||
log.Errorf(sb.String()) | |||
} | |||
|
|||
func PreflightErrors(log *logger.CLILogger, results []*preflight.UploadPreflightResult) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not following the existing pattern of printing?
Lines 25 to 49 in bc8004b
func printRestoresTable(restores []velerov1.Restore) { | |
w := NewTabWriter() | |
defer w.Flush() | |
fmtColumns := "%s\t%s\t%s\t%s\t%s\t%s\t%s\n" | |
fmt.Fprintf(w, fmtColumns, "NAME", "BACKUP", "STATUS", "STARTED", "COMPLETED", "ERRORS", "WARNINGS") | |
for _, r := range restores { | |
var startedAt *time.Time | |
if r.Status.StartTimestamp != nil && !r.Status.StartTimestamp.Time.IsZero() { | |
startedAt = &r.Status.StartTimestamp.Time | |
} | |
var completedAt *time.Time | |
if r.Status.CompletionTimestamp != nil && !r.Status.CompletionTimestamp.Time.IsZero() { | |
completedAt = &r.Status.CompletionTimestamp.Time | |
} | |
phase := r.Status.Phase | |
if phase == "" { | |
phase = "New" | |
} | |
fmt.Fprintf(w, fmtColumns, r.ObjectMeta.Name, r.Spec.BackupName, phase, startedAt, completedAt, fmt.Sprintf("%d", r.Status.Errors), fmt.Sprintf("%d", r.Status.Warnings)) | |
} | |
} |
that would help align the column titles with the values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not following the existing pattern of printing?
No particular reason. I'll try to adopt the same approach
cmd/kots/cli/install.go
Outdated
perr := preflightError{} | ||
if errors.As(err, &perr) { | ||
print.PreflightErrors(log, perr.Results) | ||
cmd.SilenceErrors = true // Stop Cobra from printing the error, we print it ourselves |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of silencing the errors, can we just return/print a generic error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of silencing the errors, can we just return/print a generic error message?
That extra duplicated line after the formatted preflight check results at the end did not feel right. This is how it ends up looking like
evans@evans-vm3:~$ ./kots install thanos-reloaded -n default --shared-password password --license-file ~/thanos_license.yaml --kubeconfig /etc/rancher/k3s/k3s.yaml
• Deploying Admin Console
• Creating namespace ✓
• Waiting for datastore to be ready ✓
• Waiting for Admin Console to be ready ✓
• Waiting for installation to complete ✓
• Waiting for preflight checks to complete ✓
• There are preflight check warnings for the application. The app was not deployed.
STATE TITLE MESSAGE
PASS Required Kubernetes Version Your cluster meets the recommended and required versions of Kubernetes.
PASS Container Runtime containerd container runtime was found.
WARN Kubernetes Distribution Unable to determine the distribution of Kubernetes.
WARN Total CPU Cores The cluster should contain at least 4 cores.
Error: There are preflight check warnings for the application. The app was not deployed.
NOTE: Any other errors will get printed, we just silence the errors when we format the preflight checks table since we are printing our own error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Use tabwriter to log preflight check results * DRY code
What this PR does / why we need it:
When deployments fail due to preflights having warnings/errors, users do not get to see which of the preflights failed. This change logs all the executed preflights whenever they prevent a deployment. We print all preflights so as to also have users see other preflights that got run so they can perform additional review of their preflights.
Screenshot of the CLI logs after preflights failing/warning
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Steps to reproduce
Does this PR introduce a user-facing change?
Does this PR require documentation?