Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Increase initial time to stabilize for FVTs #429

Merged
merged 4 commits into from
Sep 13, 2023

Conversation

ckadner
Copy link
Member

@ckadner ckadner commented Sep 12, 2023

Motivation

Address most frequent of the recent FVT failures due to exceeded timeouts:

  • Predictor test suite fails to initialize:
    https://github.com/kserve/modelmesh-serving/actions/runs/6017567397/job/16324018749
    [SynchronizedBeforeSuite] [FAILED] [39.405 seconds]
    [SynchronizedBeforeSuite] 
    /home/runner/work/modelmesh-serving/modelmesh-serving/fvt/predictor/predictor_suite_test.go:34
    
      Timeline >>
      2023-08-29T21:29:31Z	INFO	Initializing test suite
      2023-08-29T21:29:31Z	INFO	Using environment variables	{"NAMESPACE": "modelmesh-serving", "SERVICENAME": "modelmesh-serving", "CONTROLLERNAMESPACE": "modelmesh-serving", "NAMESPACESCOPEMODE": false}
      2023-08-29T21:29:31Z	INFO	FVTClientInstance created	{"client": {"Interface":{}}}
      2023-08-29T21:29:31Z	INFO	Delete all predictors ...
      2023-08-29T21:29:33Z	INFO	Delete all inference services ...
      2023-08-29T21:29:40Z	INFO	Secret 'fvt-tls-secret' created
      2023-08-29T21:29:40Z	INFO	Watcher got event with object	{"name": "modelmesh-serving-mlserver-1.x", "replicas": 1, "available": 0, "updated": 1}
      2023-08-29T21:29:40Z	INFO	Watcher got event with object	{"name": "modelmesh-serving-ovms-1.x", "replicas": 1, "available": 0, "updated": 1}
      2023-08-29T21:29:40Z	INFO	Watcher got event with object	{"name": "modelmesh-serving-triton-2.x", "replicas": 1, "available": 0, "updated": 1}
      2023-08-29T21:30:10Z	INFO	Timed out after 30s without events
      [FAILED] in [SynchronizedBeforeSuite] - /home/runner/work/modelmesh-serving/modelmesh-serving/fvt/helpers.go:439 @ 08/29/23 21:30:10.535
      << Timeline
    
      [FAILED] Timed out before deployments were ready: map[modelmesh-serving-mlserver-1.x:false modelmesh-serving-ovms-1.x:false modelmesh-serving-triton-2.x:false]
      Expected
          <bool>: false
      to be true
      In [SynchronizedBeforeSuite] at: /home/runner/work/modelmesh-serving/modelmesh-serving/fvt/helpers.go:439 @ 08/29/23 21:30:10.535
    ------------------------------
    
    Summarizing 2 Failures:
      [FAIL] [SynchronizedBeforeSuite] 
      /home/runner/work/modelmesh-serving/modelmesh-serving/fvt/predictor/predictor_suite_test.go:34
      [FAIL] [SynchronizedBeforeSuite] 
      /home/runner/work/modelmesh-serving/modelmesh-serving/fvt/helpers.go:439
    
    Ran 0 of 117 Specs in 39.861 seconds
    FAIL! - Interrupted by Other Ginkgo Process -- A BeforeSuite node failed so all tests were skipped.
    
    Ginkgo ran 4 suites in 3m22.65165007s
    
    There were failures detected in the following suites:
      predictor ./fvt/predictor
    
    Test Suite Failed
    
  • "AllowAnyPVC" storage test initialization timeout:
    https://github.com/kserve/modelmesh-serving/actions/runs/6029197178/job/16358155402
     [FAILED] Timeout before InferenceService 'isvc-pvc3-zpkq8' reached any of the activeModelStates [Standby Loaded]
        Expected
            <bool>: false
        to be true
        In [It] at: /home/runner/work/modelmesh-serving/modelmesh-serving/fvt/helpers.go:296 @ 08/22/23 22:28:40.617
      
      ------------------------------
      
      Summarizing 1 Failure:
        [FAIL] ISVCs with PVC not in storage-config [It] should load a model when allowAnyPVC
        /home/runner/work/modelmesh-serving/modelmesh-serving/fvt/helpers.go:296
      
      Ran 10 of 11 Specs in 364.638 seconds
      FAIL! - Interrupted by Other Ginkgo Process -- 9 Passed | 1 Failed | 0 Pending | 1 Skipped
      
      There were failures detected in the following suites:
        storage ./fvt/storage
    

Modifications

  • Increase the initial time to stabilize for predictor tests
  • Scale to zero before AllowAnyPVC tests

Result

The FVT suites ran 5 consecutive times without failures:

The average runtime remains just under 30 mins, inline with the previous overall completion time for successful FVT tests:
https://github.com/kserve/modelmesh-serving/actions/workflows/fvt.yml?query=is%3Asuccess+

Scale to zero before AllowAnyPVC test

Signed-off-by: Christian Kadner <[email protected]>
@ckadner ckadner requested review from tjohnson31415 and removed request for joerunde September 12, 2023 02:12
@ckadner ckadner marked this pull request as ready for review September 12, 2023 18:00
Signed-off-by: Christian Kadner <[email protected]>
Signed-off-by: Christian Kadner <[email protected]>
@ckadner
Copy link
Member Author

ckadner commented Sep 12, 2023

I kicked off a few more FVT runs:

All completed successfully.

With an average runtime of just under 30 minutes, the increased timeouts do not seem to add to the (previous) overall completion time for successful FVT tests:

https://github.com/kserve/modelmesh-serving/actions/workflows/fvt.yml?query=is%3Asuccess+

@ckadner ckadner requested a review from rafvasq September 12, 2023 19:00
@ckadner ckadner added the test testing related bugs and fixes label Sep 13, 2023
Copy link
Member

@rafvasq rafvasq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kserve-oss-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ckadner, rafvasq
To complete the pull request process, please assign njhill after the PR has been reviewed.
You can assign the PR to them by writing /assign @njhill in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ckadner ckadner merged commit d4fc59a into kserve:main Sep 13, 2023
3 checks passed
@ckadner
Copy link
Member Author

ckadner commented Sep 13, 2023

Thanks @rafvasq

@ckadner ckadner removed the request for review from tjohnson31415 October 2, 2023 22:42
@ckadner ckadner assigned ckadner and unassigned rafvasq Oct 2, 2023
@ckadner ckadner added this to the v0.11.1 milestone Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm test testing related bugs and fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants