Skip to content
This repository has been archived by the owner on Sep 19, 2022. It is now read-only.

use the priority of kube-batch #209

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

YesterdayxD
Copy link

I add the priority of kube-batch in pytorch-operator. I changed the code of tf-operator and kubebatch in vender dir,because their code is not latest.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign richardsliu
You can assign the PR to them by writing /assign @richardsliu in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link

Hi @YesterdayxD. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@coveralls
Copy link

coveralls commented Aug 19, 2019

Coverage Status

Coverage remained the same at 85.281% when pulling 6c24d6e on YesterdayxD:master into 0b237bc on kubeflow:master.

@@ -69,6 +69,10 @@ type PyTorchJobSpec struct {
// "Worker": PyTorchReplicaSpec,
// }
PyTorchReplicaSpecs map[PyTorchReplicaType]*common.ReplicaSpec `json:"pytorchReplicaSpecs"`

//添加判断优先级的属性
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use English here.


//添加判断优先级的属性
//add PriorityClassName
PriorityClassName string `json:"priorityClassName,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PriorityClassName string `json:"priorityClassName,omitempty"`
PriorityClassName *string `json:"priorityClassName,omitempty"`

Since it is optional, we can define it as a pointer.

_, err := pc.SyncPodGroup(job, minAvailableReplicas)
priorityClassName:=getPriorityClassName(job)
//_, err := pc.SyncPodGroup(job, minAvailableReplicas)
_, err := pc.SyncPodGroupTest(job, minAvailableReplicas,priorityClassName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we use SyncPodGroupTest instead of SyncPodGroup

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add these codes in SyncPodGroupTest:
Spec: v1alpha1.PodGroupSpec{ MinMember: minAvailable.IntVal, PriorityClassName: priorityClassName, },

the name of this function is inappropriate,it is used to test my idea.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaocegege Do I detele my SyncPodGroupTest function and move the code my wrote into original SyncPodGroup function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it.

@@ -69,6 +69,7 @@ func (pc *PyTorchController) updateStatusSingle(job *pyv1.PyTorchJob, rtype pyv1

// Expect to have `replicas - succeeded` pods alive.
commonType := common.ReplicaType(rtype)
//expected是成功的判断标志,等于0时,成功的数量等于副本数,认为成功
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use English here

@YesterdayxD
Copy link
Author

@gaocegege Thank you.I will change the code you mentioned

@gaocegege
Copy link
Member

pkg/controller.v1/pytorch/controller.go:1::warning: file is not goimported (goimports)

pkg/controller.v1/pytorch/job.go:1::warning: file is not goimported (goimports)

pkg/controller.v1/pytorch/job.go:221:2:warning: should merge variable declaration with assignment on next line (S1021) (staticcheck)

Thanks

PS, please fix the linting issues.

/ok-to-test

@YesterdayxD
Copy link
Author

YesterdayxD commented Aug 20, 2019

I don't sure that what happen the goimported issue.
Is it that golang versions are different?
my golang is 1.12.6
Travis CI is go1.10
I don't know how to solve it.
can you tell me?
@gaocegege
Do I need to change my golang version?

@gaocegege
Copy link
Member

@YesterdayxD Then you can address the comments above, I can have a look at the issues.

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@YesterdayxD
Copy link
Author

@gaocegege I fixed the issue about goimports :)

@@ -216,3 +216,8 @@ func getTotalFailedReplicas(job *pyv1.PyTorchJob) int32 {
}
return totalFailedReplicas
}

func getPriorityClassName(job *pyv1.PyTorchJob) string {
priorityClassName := *(job.Spec.PriorityClassName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be an runtime error: invalid memory address or nil pointer dereference

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to remove the bracket

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right? @gaocegege

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I think you need to check if it is nil.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it.

@aws-kf-ci-bot
Copy link

@YesterdayxD: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
kubeflow-pytorch-operator-presubmit-e2e 6c24d6e link /test kubeflow-pytorch-operator-presubmit-e2e
kubeflow-pytorch-operator-presubmit 6c24d6e link /test kubeflow-pytorch-operator-presubmit

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants