Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log install preflight errors/warnings in both the CLI and kotsadm pod #4046

Merged
merged 16 commits into from
Oct 9, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 58 additions & 13 deletions cmd/kots/cli/install.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ import (
"encoding/json"
"fmt"
"io"
"io/ioutil"
"mime/multipart"
"net/http"
"net/url"
Expand Down Expand Up @@ -47,6 +46,10 @@ import (
"k8s.io/client-go/kubernetes"
)

var client = &http.Client{
Copy link
Member Author

@banjoh banjoh Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, there was no timeout. Just being defensive here.

Timeout: time.Second * 30,
}

func InstallCmd() *cobra.Command {
cmd := &cobra.Command{
Use: "install [upstream uri]",
Expand Down Expand Up @@ -321,7 +324,7 @@ func InstallCmd() *cobra.Command {

log.ActionWithoutSpinner("Extracting airgap bundle")

airgapRootDir, err := ioutil.TempDir("", "kotsadm-airgap")
airgapRootDir, err := os.MkdirTemp("", "kotsadm-airgap")
if err != nil {
return errors.Wrap(err, "failed to create temp dir")
}
Expand Down Expand Up @@ -426,6 +429,7 @@ func InstallCmd() *cobra.Command {
case err := <-errChan:
if err != nil {
log.Error(err)
// TODO: Why is this a negative exit code?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question... I think this end up resulting in a 255 exit code:

$ echo $?
255

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's fix this and set it to 255

os.Exit(-1)
}
case <-stopCh:
Expand All @@ -444,11 +448,22 @@ func InstallCmd() *cobra.Command {
switch status {
case storetypes.VersionPendingPreflight:
log.ActionWithSpinner("Waiting for preflight checks to complete")
log.FinishSpinner()
Copy link
Contributor

@cbodonnell cbodonnell Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave the spinner going while we wait for the preflights and finish with success or error afterwards? Otherwise we end up with a state like this while preflights are running and a user may think things are hanging or stuck:

  • Validating registry information ✓  
  • Deploying Admin Console
    • Creating namespace ✓  
    • Waiting for datastore to be ready ✓  
  • Waiting for Admin Console to be ready ✓  
  • Waiting for installation to complete ✓  
  • Waiting for preflight checks to complete ✓

if err := ValidatePreflightStatus(deployOptions, authSlug, apiEndpoint); err != nil {
log.FinishSpinnerWithError()
return errors.Wrap(err, "failed to validate preflight results")
log.Errorf("Preflight checks contain warnings or errors. The application was not deployed")
perr := preflightError{}
if errors.As(err, &perr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any benefit to handling specific error types this way vs. how do it elsewhere in the codebase? For example: https://github.com/replicatedhq/kots/blob/main/pkg/handlers/snapshots.go#L202

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any benefit to handling specific error types this way vs. how do it elsewhere in the codebase? For example: https://github.com/replicatedhq/kots/blob/main/pkg/handlers/snapshots.go#L202

They both achieve the same result I think. errors.Cause() is from github.com/pkg/errors library while errors.As() is part of go standard library

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if they achieve the same thing, let's just stick to errors.Cause() for consistency?

cmd.SilenceErrors = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to silence errors if we're always going to call os.Exit(1) at the end of this code path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to silence errors if we're always going to call os.Exit(1) at the end of this code path?

Perhaps not. Removing it

var s strings.Builder
s.WriteString("\nPreflight check results (state - title - message)")
Copy link
Contributor

@cbodonnell cbodonnell Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a pkg/print where have logic for pretty-printing various things. I think pkg/print/config.go could be a good reference since those print config validation errors, which might be similar to something like preflight warnings/errors. This logic could live somewhere in that package.

for _, result := range perr.Results {
s.WriteString(fmt.Sprintf("\n%s - %q - %q", preflightState(result), result.Title, result.Message))
}
log.Info(s.String())
os.Exit(1)
}
return err
}
log.FinishSpinner()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should leave this as we need to stop the spinner in the case where preflights pass too. Otherwise we are left with an indefinite spinner:

  • Deploying Admin Console
    • Creating namespace ✓  
    • Waiting for datastore to be ready ✓  
  • Waiting for Admin Console to be ready ✓  
  • Waiting for installation to complete ✓  
  • Waiting for preflight checks to complete ⠇
  • Press Ctrl+C to exit
  • Go to http://localhost:8800 to access the Admin Console

  • Waiting for preflight checks to complete ⠏

case storetypes.VersionPendingConfig:
log.ActionWithoutSpinnerWarning("Additional app configuration is required. Please login to the Admin Console to continue", nil)
}
Expand Down Expand Up @@ -692,7 +707,7 @@ func getIngressConfig(v *viper.Viper) (*kotsv1beta1.IngressConfig, error) {

ingressConfig := kotsv1beta1.IngressConfig{}
if ingressConfigPath != "" {
content, err := ioutil.ReadFile(ingressConfigPath)
content, err := os.ReadFile(ingressConfigPath)
if err != nil {
return nil, errors.Wrap(err, "failed to read ingress service config file")
}
Expand All @@ -719,7 +734,7 @@ func getIdentityConfig(v *viper.Viper) (*kotsv1beta1.IdentityConfig, error) {

identityConfig := kotsv1beta1.IdentityConfig{}
if identityConfigPath != "" {
content, err := ioutil.ReadFile(identityConfigPath)
content, err := os.ReadFile(identityConfigPath)
if err != nil {
return nil, errors.Wrap(err, "failed to read identity service config file")
}
Expand Down Expand Up @@ -910,7 +925,7 @@ func getAutomatedInstallStatus(url string, authSlug string) (*kotsstore.TaskStat
return nil, errors.Errorf("unexpected status code %d", resp.StatusCode)
}

b, err := ioutil.ReadAll(resp.Body)
b, err := io.ReadAll(resp.Body)
if err != nil {
return nil, errors.Wrap(err, "failed to read response body")
}
Expand Down Expand Up @@ -964,7 +979,7 @@ func getPreflightResponse(url string, authSlug string) (*handlers.GetPreflightRe
newReq.Header.Add("Content-Type", "application/json")
newReq.Header.Add("Authorization", authSlug)

resp, err := http.DefaultClient.Do(newReq)
resp, err := client.Do(newReq)
if err != nil {
return nil, errors.Wrap(err, "failed to execute request")
}
Expand All @@ -974,7 +989,7 @@ func getPreflightResponse(url string, authSlug string) (*handlers.GetPreflightRe
return nil, errors.Errorf("unexpected status code %d", resp.StatusCode)
}

b, err := ioutil.ReadAll(resp.Body)
b, err := io.ReadAll(resp.Body)
if err != nil {
return nil, errors.Wrap(err, "failed to read response body")
}
Expand Down Expand Up @@ -1005,6 +1020,16 @@ func checkPreflightsComplete(response *handlers.GetPreflightResultResponse) (boo
return true, nil
}

type preflightError struct {
Results []*preflight.UploadPreflightResult
}

func (e preflightError) Error() string {
return "preflight checks have warnings or errors"
}

func (e preflightError) Unwrap() error { return fmt.Errorf("preflight checks have warnings or errors") }

func checkPreflightResults(response *handlers.GetPreflightResultResponse) (bool, error) {
if response.PreflightResult.Result == "" {
return false, nil
Expand All @@ -1027,15 +1052,35 @@ func checkPreflightResults(response *handlers.GetPreflightResultResponse) (bool,
}

if isWarn && isFail {
return false, errors.New("There are preflight check failures and warnings for the application. The app was not deployed.")
return false, preflightError{
Results: results.Results,
}
}

if isWarn {
return false, errors.New("There are preflight check warnings for the application. The app was not deployed.")
return false, preflightError{
Results: results.Results,
}
}
if isFail {
return false, errors.New("There are preflight check failures for the application. The app was not deployed.")
return false, preflightError{
Results: results.Results,
}
}

return true, nil
}

func preflightState(r *preflight.UploadPreflightResult) string {
if r.IsFail {
return "FAIL"
}
if r.IsWarn {
return "WARN"
}
if r.IsPass {
return "PASS"
}
// We should never get here
return "UNKNOWN"
}
2 changes: 1 addition & 1 deletion pkg/handlers/preflight.go
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ func (h *Handler) StartPreflightChecks(w http.ResponseWriter, r *http.Request) {
return
}

archiveDir, err := ioutil.TempDir("", "kotsadm")
archiveDir, err := os.MkdirTemp("", "kotsadm")
if err != nil {
logger.Error(errors.Wrap(err, "failed to create temp dir"))
w.WriteHeader(http.StatusInternalServerError)
Expand Down
3 changes: 1 addition & 2 deletions pkg/kotsutil/troubleshoot.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ package kotsutil

import (
"context"
"io/ioutil"
"os"
"path/filepath"

Expand All @@ -22,7 +21,7 @@ func LoadTSKindsFromPath(dir string) (*troubleshootloader.TroubleshootKinds, err
return nil
}

contents, err := ioutil.ReadFile(path)
contents, err := os.ReadFile(path)
if err != nil {
return errors.Wrap(err, "failed to read file")
}
Expand Down
3 changes: 2 additions & 1 deletion pkg/kurl/kurl_nodes.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"io/ioutil"
"math"
"net/http"
"os"
"strconv"
"time"

Expand Down Expand Up @@ -140,7 +141,7 @@ func getNodeMetrics(nodeIP string) (*statsv1alpha1.Summary, error) {
port := 10255

// only use mutual TLS if client cert exists
_, err := ioutil.ReadFile("/etc/kubernetes/pki/kubelet/client.crt")
_, err := os.ReadFile("/etc/kubernetes/pki/kubelet/client.crt")
if err == nil {
cert, err := tls.LoadX509KeyPair("/etc/kubernetes/pki/kubelet/client.crt", "/etc/kubernetes/pki/kubelet/client.key")
if err != nil {
Expand Down
13 changes: 9 additions & 4 deletions pkg/preflight/execute.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ func setPreflightResult(appID string, sequence int64, preflightResults *types.Pr
// execute will execute the preflights using spec in preflightSpec.
// This spec should be rendered, no template functions remaining
func execute(appID string, sequence int64, preflightSpec *troubleshootv1beta2.Preflight, ignorePermissionErrors bool) (*types.PreflightResults, error) {
logger.Debug("executing preflight checks",
logger.Info("executing preflight checks",
zap.String("appID", appID),
zap.Int64("sequence", sequence))

Expand All @@ -65,7 +65,12 @@ func execute(appID string, sequence int64, preflightSpec *troubleshootv1beta2.Pr
if err, ok := msg.(error); ok {
logger.Errorf("error while running preflights: %v", err)
} else {
logger.Infof("preflight progress: %v", msg)
switch m := msg.(type) {
case preflight.CollectProgress:
logger.Infof("preflight progress: %s", m.String())
default:
logger.Infof("preflight progress: %+v", msg)
}
}

progress, ok := msg.(preflight.CollectProgress)
Expand Down Expand Up @@ -121,7 +126,7 @@ func execute(appID string, sequence int64, preflightSpec *troubleshootv1beta2.Pr
KubernetesRestConfig: restConfig,
}

logger.Debug("preflight collect phase")
logger.Info("preflight collect phase")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this to info cause they show progress when performing an installation, which is critical functionality IMO to always log

collectResults, err := troubleshootpreflight.Collect(collectOpts, preflightSpec)
if err != nil && !isPermissionsError(err) {
preflightRunError = err
Expand All @@ -147,7 +152,7 @@ func execute(appID string, sequence int64, preflightSpec *troubleshootv1beta2.Pr
}
uploadPreflightResults.Errors = rbacErrors
} else {
logger.Debug("preflight analyze phase")
logger.Info("preflight analyze phase")
analyzeResults := collectResults.Analyze()

// the typescript api added some flair to this result
Expand Down
35 changes: 32 additions & 3 deletions pkg/preflight/preflight.go
Original file line number Diff line number Diff line change
Expand Up @@ -160,13 +160,27 @@ func Run(appID string, appSlug string, sequence int64, isAirgap bool, archiveDir
preflight.Spec.Collectors = collectors

go func() {
logger.Debug("preflight checks beginning")
logger.Info("preflight checks beginning")
uploadPreflightResults, err := execute(appID, sequence, preflight, ignoreRBAC)
if err != nil {
logger.Error(errors.Wrap(err, "failed to run preflight checks"))
return
}
logger.Debug("preflight checks completed")

// Log the preflight results if there are any warnings or errors
// The app may not get installed so we need to see this info for debugging
if GetPreflightState(uploadPreflightResults) != "pass" {
// TODO: Are there conditions when the application gets installed?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so here since we should only deploy if the preflight state is "pass" https://github.com/replicatedhq/kots/blob/main/pkg/preflight/preflight.go#L236-L245

logger.Warnf("Preflight checks completed with warnings or errors. The application may not get installed")
for _, result := range uploadPreflightResults.Results {
if result == nil {
continue
}
logger.Infof("preflight state=%s title=%q message=%q", preflightState(*result), result.Title, result.Message)
}
} else {
logger.Info("preflight checks completed")
}

go func() {
err := reporting.GetReporter().SubmitAppInfo(appID) // send app and preflight info when preflights finish
Expand Down Expand Up @@ -194,7 +208,7 @@ func Run(appID string, appSlug string, sequence int64, isAirgap bool, archiveDir
// preflight reporting
if isDeployed {
if err := reporting.WaitAndReportPreflightChecks(appID, sequence, false, false); err != nil {
logger.Debugf("failed to send preflights data to replicated app: %v", err)
logger.Errorf("failed to send preflights data to replicated app: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure this message is always accurate anymore given the work to support airgap reporting. since we're exposing it as a error log with this change, how about we change the message to something more generic like:

failed to report preflight check results

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure this message is always accurate anymore given the work to support airgap reporting. since we're exposing it as a error log with this change, how about we change the message to something more generic like:

failed to report preflight check results

I've seen a number of these debug logs all over the code. I think I'll revert this change for now and leave it as is. Later on we can align the logs.

return
}
}
Expand All @@ -216,6 +230,21 @@ func Run(appID string, appSlug string, sequence int64, isAirgap bool, archiveDir
return nil
}

func preflightState(p troubleshootpreflight.UploadPreflightResult) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use the existing GetPreflightState function`?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state from GetPreflightState is of the entire list of preflights i.e if there is an error in any, the state == error. preflightState gets the state of each preflight for logging purposes.

if p.IsFail {
return "fail"
}

if p.IsWarn {
return "warn"
}

if p.IsPass {
return "pass"
}
return "unknown"
}

// maybeDeployFirstVersion will deploy the first version if preflight checks pass
func maybeDeployFirstVersion(appID string, sequence int64, preflightResults *types.PreflightResults) (bool, error) {
if sequence != 0 {
Expand Down
7 changes: 6 additions & 1 deletion pkg/reporting/distribution.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,12 @@ func distributionFromServerGroupAndResources(clientset kubernetes.Interface) Dis

func distributionFromProviderId(clientset kubernetes.Interface) Distribution {
nodes, err := clientset.CoreV1().Nodes().List(context.TODO(), metav1.ListOptions{})
logger.Infof("Found %d nodes", len(nodes.Items))
nodeCount := len(nodes.Items)
if nodeCount > 1 {
logger.Infof("Found %d nodes", nodeCount)
} else {
logger.Infof("Found %d node", nodeCount)
}
if err != nil {
logger.Infof("got error listing node: %v", err.Error())
}
Expand Down
Loading