Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: find no available rootcoord, check rootcoord state #38310

Open
1 task done
gaoyuan5251 opened this issue Dec 9, 2024 · 15 comments
Open
1 task done

[Bug]: find no available rootcoord, check rootcoord state #38310

gaoyuan5251 opened this issue Dec 9, 2024 · 15 comments
Assignees
Labels
help wanted Extra attention is needed kind/bug Issues or changes related a bug

Comments

@gaoyuan5251
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.4.17
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others: 
Object Storage: aliyun OSS

Current Behavior

kubectl get po -nhll-milvus
NAME READY STATUS RESTARTS AGE
milvus-attu-645d447d7f-c6j2l 1/1 Running 0 9m58s
milvus-datanode-786857fd49-kqv5l 0/1 Running 2 (87s ago) 9m58s
milvus-indexnode-6d6cd775bd-xdbgp 1/1 Running 0 9m58s
milvus-mixcoord-675c9956f7-zj55m 0/1 CrashLoopBackOff 6 (3m56s ago) 9m58s
milvus-proxy-749766ccc9-k6v8j 0/1 Running 2 (87s ago) 9m58s
milvus-querynode-7c85dfd9f8-m45kn 0/1 CrashLoopBackOff 6 (3m50s ago) 9m58s

Expected Behavior

Successfully deployed

Steps To Reproduce

helm deploy:https://milvus.io/docs/v2.4.x/install_cluster-helm.md

Milvus Log

helm config file:
values.yaml.log

log:
logs-querynode.txt
logs-proxy.txt
logs-mixcoord.txt
logs-datanode.txt

Anything else?

No response

@gaoyuan5251 gaoyuan5251 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 9, 2024
@gaoyuan5251
Copy link
Author

@yanliang567 Please take a look at this problem

@xiaofan-luan
Copy link
Collaborator

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.4.17
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others: 
Object Storage: aliyun OSS

Current Behavior

kubectl get po -nhll-milvus NAME READY STATUS RESTARTS AGE milvus-attu-645d447d7f-c6j2l 1/1 Running 0 9m58s milvus-datanode-786857fd49-kqv5l 0/1 Running 2 (87s ago) 9m58s milvus-indexnode-6d6cd775bd-xdbgp 1/1 Running 0 9m58s milvus-mixcoord-675c9956f7-zj55m 0/1 CrashLoopBackOff 6 (3m56s ago) 9m58s milvus-proxy-749766ccc9-k6v8j 0/1 Running 2 (87s ago) 9m58s milvus-querynode-7c85dfd9f8-m45kn 0/1 CrashLoopBackOff 6 (3m50s ago) 9m58s

Expected Behavior

Successfully deployed

Steps To Reproduce

helm deploy:https://milvus.io/docs/v2.4.x/install_cluster-helm.md

Milvus Log

helm config file: values.yaml.log

log: logs-querynode.txt logs-proxy.txt logs-mixcoord.txt logs-datanode.txt

Anything else?

No response

[2024/12/09 09:07:55.270 +00:00] [INFO] [etcd/etcd_util.go:47] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[etcd-0.etcd.hll-etcd.svc.cluster.local:2379,etcd-1.etcd.hll-etcd.svc.cluster.local:2379,etcd-2.etcd.hll-etcd.svc.cluster.local:2379]"] [minVersion=1.3]
panic: minio.port is empty

config is wrong. Minio port is not set

@yanliang567
Copy link
Contributor

/assign @gaoyuan5251
/unassign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 9, 2024
@yanliang567 yanliang567 added help wanted Extra attention is needed and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Dec 9, 2024
@gaoyuan5251
Copy link
Author

gaoyuan5251 commented Dec 10, 2024

此问题是否存在?

  • 我已经搜索了现有问题

环境

- Milvus version: v2.4.17
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others: 
Object Storage: aliyun OSS

当前行为

kubectl get po -nhll-milvus NAME READY STATUS RESTARTS AGE milvus-attu-645d447d7f-c6j2l 1/1 正在运行 0 9m58s milvus-datanode-786857fd49-kqv5l 0/1 正在运行 2 (87 秒前) 9m58s milvus-indexnode-6d6cd775bd-xdbgp 1/1 正在运行 0 9m58s milvus-mixcoord-675c9956f7-zj55m 0/1 CrashLoopBackOff 6 (3m56 秒前) 9m58s milvus-proxy-749766ccc9-k6v8j 0/1 正在运行 2 (87 秒前) 9m58s milvus-querynode-7c85dfd9f8-m45kn 0/1 CrashLoopBackOff 6(3 分 50 秒前)9 分 58 秒

预期行为

成功部署

重现步骤

helm deploy:https://milvus.io/docs/v2.4.x/install_cluster-helm.md

Milvus 日志

helm 配置文件:values.yaml.log
日志:logs-querynode.txt logs-proxy.txt logs-mixcoord.txt logs-datanode.txt

还要别的吗?

没有回应

[2024/12/09 09:07:55.270 +00:00] [INFO] [etcd/etcd_util.go:47] ["创建 etcd 客户端"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[etcd-0.etcd.hll-etcd.svc.cluster.local:2379,etcd-1.etcd.hll-etcd.svc.cluster.local:2379,etcd-2.etcd.hll-etcd.svc.cluster.local:2379]"] [minVersion=1.3] 恐慌:minio.port 为空

配置错误。Minio 端口未设置

The object storage I use is Ali Cloud OSS, I added port: 443, wait a minute

@gaoyuan5251
Copy link
Author

gaoyuan5251 commented Dec 10, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.4.17
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others: 
Object Storage: aliyun OSS

Current Behavior

kubectl get po -nhll-milvus NAME READY STATUS RESTARTS AGE milvus-attu-645d447d7f-c6j2l 1/1 Running 0 9m58s milvus-datanode-786857fd49-kqv5l 0/1 Running 2 (87s ago) 9m58s milvus-indexnode-6d6cd775bd-xdbgp 1/1 Running 0 9m58s milvus-mixcoord-675c9956f7-zj55m 0/1 CrashLoopBackOff 6 (3m56s ago) 9m58s milvus-proxy-749766ccc9-k6v8j 0/1 Running 2 (87s ago) 9m58s milvus-querynode-7c85dfd9f8-m45kn 0/1 CrashLoopBackOff 6 (3m50s ago) 9m58s

Expected Behavior

Successfully deployed

Steps To Reproduce

helm deploy:https://milvus.io/docs/v2.4.x/install_cluster-helm.md

Milvus Log

helm config file: values.yaml.log
log: logs-querynode.txt logs-proxy.txt logs-mixcoord.txt logs-datanode.txt

Anything else?

No response

[2024/12/09 09:07:55.270 +00:00] [INFO] [etcd/etcd_util.go:47] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[etcd-0.etcd.hll-etcd.svc.cluster.local:2379,etcd-1.etcd.hll-etcd.svc.cluster.local:2379,etcd-2.etcd.hll-etcd.svc.cluster.local:2379]"] [minVersion=1.3] panic: minio.port is empty

config is wrong. Minio port is not set

#38310 (comment)

此问题是否存在?

  • 我已经搜索了现有问题

环境

- Milvus version: v2.4.17
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others: 
Object Storage: aliyun OSS

当前行为

kubectl get po -nhll-milvus NAME READY STATUS RESTARTS AGE milvus-attu-645d447d7f-c6j2l 1/1 正在运行 0 9m58s milvus-datanode-786857fd49-kqv5l 0/1 正在运行 2 (87 秒前) 9m58s milvus-indexnode-6d6cd775bd-xdbgp 1/1 正在运行 0 9m58s milvus-mixcoord-675c9956f7-zj55m 0/1 CrashLoopBackOff 6 (3m56 秒前) 9m58s milvus-proxy-749766ccc9-k6v8j 0/1 正在运行 2 (87 秒前) 9m58s milvus-querynode-7c85dfd9f8-m45kn 0/1 CrashLoopBackOff 6(3 分 50 秒前)9 分 58 秒

预期行为

成功部署

重现步骤

helm deploy:https://milvus.io/docs/v2.4.x/install_cluster-helm.md

Milvus 日志

helm 配置文件:values.yaml.log
日志:logs-querynode.txt logs-proxy.txt logs-mixcoord.txt logs-datanode.txt

还要别的吗?

没有回应

[2024/12/09 09:07:55.270 +00:00] [INFO] [etcd/etcd_util.go:47] ["创建 etcd 客户端"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[etcd-0.etcd.hll-etcd.svc.cluster.local:2379,etcd-1.etcd.hll-etcd.svc.cluster.local:2379,etcd-2.etcd.hll-etcd.svc.cluster.local:2379]"] [minVersion=1.3] 恐慌:minio.port 为空
配置错误。Minio 端口未设置

The object storage I use is Ali Cloud OSS, I added port: 443, wait a minute

@xiaofan-luan @yanliang567
new log file: logs-mixcoord-2.txt
[2024/12/10 02:05:05.312 +00:00] [WARN] [storage/minio_object_storage.go:139] ["failed to check blob bucket exist"] [bucket=cn-sz-******-5348] [error="Access Denied."]

I am sure that my OSS can be accessed, please help me check whether the configuration of externalS3: item in my values.yaml file is correct.
image
image

@xiaofan-luan
Copy link
Collaborator

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.4.17
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others: 
Object Storage: aliyun OSS

Current Behavior

kubectl get po -nhll-milvus NAME READY STATUS RESTARTS AGE milvus-attu-645d447d7f-c6j2l 1/1 Running 0 9m58s milvus-datanode-786857fd49-kqv5l 0/1 Running 2 (87s ago) 9m58s milvus-indexnode-6d6cd775bd-xdbgp 1/1 Running 0 9m58s milvus-mixcoord-675c9956f7-zj55m 0/1 CrashLoopBackOff 6 (3m56s ago) 9m58s milvus-proxy-749766ccc9-k6v8j 0/1 Running 2 (87s ago) 9m58s milvus-querynode-7c85dfd9f8-m45kn 0/1 CrashLoopBackOff 6 (3m50s ago) 9m58s

Expected Behavior

Successfully deployed

Steps To Reproduce

helm deploy:https://milvus.io/docs/v2.4.x/install_cluster-helm.md

Milvus Log

helm config file: values.yaml.log
log: logs-querynode.txt logs-proxy.txt logs-mixcoord.txt logs-datanode.txt

Anything else?

No response

[2024/12/09 09:07:55.270 +00:00] [INFO] [etcd/etcd_util.go:47] ["create etcd client"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[etcd-0.etcd.hll-etcd.svc.cluster.local:2379,etcd-1.etcd.hll-etcd.svc.cluster.local:2379,etcd-2.etcd.hll-etcd.svc.cluster.local:2379]"] [minVersion=1.3] panic: minio.port is empty
config is wrong. Minio port is not set

#38310 (comment)

此问题是否存在?

  • 我已经搜索了现有问题

环境

- Milvus version: v2.4.17
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others: 
Object Storage: aliyun OSS

当前行为

kubectl get po -nhll-milvus NAME READY STATUS RESTARTS AGE milvus-attu-645d447d7f-c6j2l 1/1 正在运行 0 9m58s milvus-datanode-786857fd49-kqv5l 0/1 正在运行 2 (87 秒前) 9m58s milvus-indexnode-6d6cd775bd-xdbgp 1/1 正在运行 0 9m58s milvus-mixcoord-675c9956f7-zj55m 0/1 CrashLoopBackOff 6 (3m56 秒前) 9m58s milvus-proxy-749766ccc9-k6v8j 0/1 正在运行 2 (87 秒前) 9m58s milvus-querynode-7c85dfd9f8-m45kn 0/1 CrashLoopBackOff 6(3 分 50 秒前)9 分 58 秒

预期行为

成功部署

重现步骤

helm deploy:https://milvus.io/docs/v2.4.x/install_cluster-helm.md

Milvus 日志

helm 配置文件:values.yaml.log
日志:logs-querynode.txt logs-proxy.txt logs-mixcoord.txt logs-datanode.txt

还要别的吗?

没有回应

[2024/12/09 09:07:55.270 +00:00] [INFO] [etcd/etcd_util.go:47] ["创建 etcd 客户端"] [useEmbedEtcd=false] [useSSL=false] [endpoints="[etcd-0.etcd.hll-etcd.svc.cluster.local:2379,etcd-1.etcd.hll-etcd.svc.cluster.local:2379,etcd-2.etcd.hll-etcd.svc.cluster.local:2379]"] [minVersion=1.3] 恐慌:minio.port 为空
配置错误。Minio 端口未设置

The object storage I use is Ali Cloud OSS, I added port: 443, wait a minute

@xiaofan-luan @yanliang567 new log file: logs-mixcoord-2.txt [2024/12/10 02:05:05.312 +00:00] [WARN] [storage/minio_object_storage.go:139] ["failed to check blob bucket exist"] [bucket=cn-sz-******-5348] [error="Access Denied."]

I am sure that my OSS can be accessed, please help me check whether the configuration of externalS3: item in my values.yaml file is correct. image image

maybe you need to setup AK/SK?

on milvus.yaml
accessKeyID: minioadmin
secretAccessKey: minioadmin

@xiaofan-luan
Copy link
Collaborator

did you check that the AK/SK is set correctly?

if you double checked and make sure everything works. please try https://github.com/zhagnlu/milvus-store-tool 
and see if it can access your oss from your K8s

@gaoyuan5251
Copy link
Author

milvus-store-tool

@xiaofan-luan This plug-in is limited by our network and cannot be pulled down.
I have the correct AK/SK set
image

@haorenfsa
Copy link
Contributor

@gaoyuan5251 May I see your RAM Policy binded to your Accesskey in Aliyun? Milvus needs oss:ListBuckets permission for oss bucket, it seems your policy configuration is incorrect.

@gaoyuan5251
Copy link
Author

@gaoyuan5251 May I see your RAM Policy binded to your Accesskey in Aliyun? Milvus needs oss:ListBuckets permission for oss bucket, it seems your policy configuration is incorrect.

@haorenfsa let me check

@haorenfsa
Copy link
Contributor

you may also need set resources limit to

[
        "acs:oss:*:*:your-bucket",
        "acs:oss:*:*:your-bucket/*
]

it may be like below:

{
  "Statement": [
    {
      "Action": "oss:*",
      "Effect": "Allow",
      "Resource": [
        "acs:oss:*:*:your-bucket",
        "acs:oss:*:*:your-bucket/*"
      ]
    }
  ],
  "Version": "1"
}

@gaoyuan5251
Copy link
Author

gaoyuan5251 commented Dec 10, 2024

您可能还需要将资源限制设置为

[
        "acs:oss:*:*:your-bucket",
        "acs:oss:*:*:your-bucket/*
]

它可能像下面这样:

{
  "Statement": [
    {
      "Action": "oss:*",
      "Effect": "Allow",
      "Resource": [
        "acs:oss:*:*:your-bucket",
        "acs:oss:*:*:your-bucket/*"
      ]
    }
  ],
  "Version": "1"
}

@haorenfsa This may involve secret

@haorenfsa
Copy link
Contributor

It seems we don't have mininum Policy permission examples. We may need one for each cloud.

@haorenfsa
Copy link
Contributor

For others who encountered with the same issue, make sure your S3 policy is set similar to below or equivalents in other cloud.
image

@gaoyuan5251
Copy link
Author

It has been proved that the oss ram policy configuration is as follows: '< bucketname> 'Replace it with your own bucket name.

{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"oss:ListObjects",
"oss:GetObject",
"oss:DeleteObject",
"oss:PutObject",
"oss:GetBucketLocation",
"oss:ListBuckets",
"oss:HeadBucket"
],
"Resource": [
"acs:oss:oss-::",
"acs:oss:oss-::/"
]
},
{
"Effect": "Allow",
"Action": "oss:ListBuckets",
"Resource": "
"
}
]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed kind/bug Issues or changes related a bug
Projects
None yet
Development

No branches or pull requests

4 participants