Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CassandraTask reconcile fails with json unmarshal error #587

Closed
vsoloviov opened this issue Oct 18, 2023 · 4 comments
Closed

CassandraTask reconcile fails with json unmarshal error #587

vsoloviov opened this issue Oct 18, 2023 · 4 comments
Labels
bug Something isn't working done Issues in the state 'done'

Comments

@vsoloviov
Copy link

What happened?

I have created a task

apiVersion: control.k8ssandra.io/v1alpha1
kind: CassandraTask
metadata:
  name: rebuild-dc2
spec:
  datacenter:
    name: dc2
    namespace: cassandra
  jobs:
    - name: rebuild-dc2
      command: rebuild
      args:
        source_datacenter: dc1

But I found it only triggered rebuild on a single pod.

What did you expect to happen?

I expected task to be triggered on all 3 pods of my datacenter.

How can we reproduce it (as minimally and precisely as possible)?

Create a task to rebuild datacenter, check operator's logs.

cass-operator version

v1.17.2

Kubernetes version

v1.27.4

Method of installation

Helm

Anything else we need to know?

Here is my findings:

When I create a task, it's being created on the pod-0:

kubectl -n hsm-db get pod cluster1-dc2-default-sts-0 -o yaml | yq .metadata | grep job
  control.k8ssandra.io/job-ffedaf36-a81a-413f-9635-f4796e7e611c: '{"id":"b4d23d4f-8d9a-40d8-b0c8-d3b5595593ea","handler":"management-api"}'

Since .spec.concurrencyPolicy is Forbid, task is being executed on the first pod and is waiting to completion status sent back before proceeding with the next pod.

On cass-operator pod I can see following logs (I have omitted pod json output):

2023-10-18T13:21:23.451Z	INFO	calling Management API features - GET /api/v0/metadata/versions/features	{"controller": "cassandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "CassandraTask", "CassandraTask": {"name":"rebuild-dc2","namespace":"cassandra"}, "namespace": "cassandra", "name": "rebuild-dc2", "reconcileID": "9bf92e19-1fe9-4eaa-b828-60847e63dba6", "pod": "cluster1-dc2-default-sts-0"}
2023-10-18T13:21:23.451Z	INFO	client::callNodeMgmtEndpoint	{"controller": "cassandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "CassandraTask", "CassandraTask": {"name":"rebuild-dc2","namespace":"cassandra"}, "namespace": "cassandra", "name": "rebuild-dc2", "reconcileID": "9bf92e19-1fe9-4eaa-b828-60847e63dba6"}
2023-10-18T13:21:23.469Z	INFO	calling Management API keyspace rebuild - POST /api/v1/ops/node/rebuild	{"controller": "cassandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "CassandraTask", "CassandraTask": {"name":"rebuild-dc2","namespace":"cassandra"}, "namespace": "cassandra", "name": "rebuild-dc2", "reconcileID": "9bf92e19-1fe9-4eaa-b828-60847e63dba6", "pod": "cluster1-dc2-default-sts-0"}
2023-10-18T13:21:23.469Z	INFO	client::callNodeMgmtEndpoint	{"controller": "cassandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "CassandraTask", "CassandraTask": {"name":"rebuild-dc2","namespace":"cassandra"}, "namespace": "cassandra", "name": "rebuild-dc2", "reconcileID": "9bf92e19-1fe9-4eaa-b828-60847e63dba6"}
2023-10-18T13:21:23.512Z	INFO	calling Management API features - GET /api/v0/metadata/versions/features	{"controller": "cassandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "CassandraTask", "CassandraTask": {"name":"rebuild-dc2","namespace":"cassandra"}, "namespace": "cassandra", "name": "rebuild-dc2", "reconcileID": "742e0ae3-3511-4eb2-bf96-9257b12659da", "pod": "cluster1-dc2-default-sts-0"}
2023-10-18T13:21:23.512Z	INFO	client::callNodeMgmtEndpoint	{"controller": "cassandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "CassandraTask", "CassandraTask": {"name":"rebuild-dc2","namespace":"cassandra"}, "namespace": "cassandra", "name": "rebuild-dc2", "reconcileID": "742e0ae3-3511-4eb2-bf96-9257b12659da"}
2023-10-18T13:21:23.546Z	INFO	calling Management API features - GET /api/v0/ops/executor/job	{"controller": "cassandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "CassandraTask", "CassandraTask": {"name":"rebuild-dc2","namespace":"cassandra"}, "namespace": "cassandra", "name": "rebuild-dc2", "reconcileID": "742e0ae3-3511-4eb2-bf96-9257b12659da", "pod": "cluster1-dc2-default-sts-0"}
2023-10-18T13:21:23.546Z	INFO	client::callNodeMgmtEndpoint	{"controller": "cassandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "CassandraTask", "CassandraTask": {"name":"rebuild-dc2","namespace":"cassandra"}, "namespace": "cassandra", "name": "rebuild-dc2", "reconcileID": "742e0ae3-3511-4eb2-bf96-9257b12659da"}
2023-10-18T13:21:23.849Z	ERROR	Could not get JobDetails for pod	{"controller": "cassandratask", "controllerGroup": "control.k8ssandra.io", "controllerKind": "CassandraTask", "CassandraTask": {"name":"rebuild-dc2","namespace":"cassandra"}, "namespace": "cassandra", "name": "rebuild-dc2", "reconcileID": "742e0ae3-3511-4eb2-bf96-9257b12659da", "Pod": {<omitted>}, "error": "json: cannot unmarshal number into Go struct field JobDetails.submit_time of type string"}
github.com/k8ssandra/cass-operator/internal/controllers/control.(*CassandraTaskReconciler).reconcileEveryPodTask
	/workspace/internal/controllers/control/cassandratask_controller.go:619
github.com/k8ssandra/cass-operator/internal/controllers/control.(*CassandraTaskReconciler).Reconcile
	/workspace/internal/controllers/control/cassandratask_controller.go:326
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:122
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:323
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235

So the reason seems to be here:

Struct:

type JobDetails struct {
Id string `json:"id"`
Type string `json:"type"`
Status string `json:"status"`
SubmitTime string `json:"submit_time,omitempty"`
EndTime string `json:"end_time,omitempty"`
Error string `json:"error,omitempty"`
}

And it's being unmarshalled here:

if err := json.Unmarshal(data, job); err != nil {

it gets data from /api/v0/ops/executor/job?job_id=%s

If I curl API I get a number, not a string:

curl 10.39.3.14:8080/api/v0/ops/executor/job?job_id=b4d23d4f-8d9a-40d8-b0c8-d3b5595593ea
{"id":"b4d23d4f-8d9a-40d8-b0c8-d3b5595593ea","type":"rebuild","status":"COMPLETED","submit_time":1697635283476,"end_time":1697635285454,"error":null,"status_changes":[]}

And API doc says submit_time is an integer, not string: https://github.com/k8ssandra/management-api-for-apache-cassandra/blob/master/management-api-server/doc/openapi.json#L2153-L2160

       {
          "submit_time" : {
            "type" : "integer",
            "format" : "int64"
          },
          "type" : {
            "type" : "string"
          }
        }

Am I missing something obvious or this is a bug? I'm doubt as this code was not changed for a quite a long time.

Thank you!

@vsoloviov vsoloviov added the bug Something isn't working label Oct 18, 2023
@vsoloviov
Copy link
Author

Just found a test is being added here https://github.com/k8ssandra/cass-operator/pull/584/files#diff-8975636a650e4f693ea9fc9c9870311666765068357bdaabc06bc7573c41457b

If what I described is correct then this test might be failing exactly because of type mismatch @burmanm

@burmanm
Copy link
Contributor

burmanm commented Oct 18, 2023

Hey, this was a bug in the management-api release 0.1.69. If you update your management-api image (the Cassandra image) it should solve this issue. Basically refetching the Pod's image should be enough.

@vsoloviov
Copy link
Author

That's right, thank you @burmanm!

Now I can see it was updated to string here k8ssandra/management-api-for-apache-cassandra@c46d8f9

I wonder if it should be updated in the api doc as well https://github.com/k8ssandra/management-api-for-apache-cassandra/blob/master/management-api-server/doc/openapi.json#L2153-L2156?

I'll close the issue, thanks again.

@adejanovski adejanovski added the done Issues in the state 'done' label Oct 18, 2023
@burmanm
Copy link
Contributor

burmanm commented Oct 18, 2023

No, the "int64" is correct there, that's the type of data that's in the field. But JSON can't serialize int64, instead it serializes all the numbers as float64, so that would cause certain numbers to be incorrectly serialized/deserialized. Thus, they need to be serialized as strings to retain all the information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working done Issues in the state 'done'
Projects
No open projects
Archived in project
Development

No branches or pull requests

3 participants