You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened?
During testing the DSE support, I ran into an issue were a backup of an already restored cluster does not happen. I did:
Used the most recent medusa (with the DSE snapshot recursion bug) (DefaultMedusaVersion = "c8609c8-tmp")
Configured a bigger minio volume (30G)
Make a DSE cluster with make single-up ; E2E_TEST="TestOperator/CreateSingleDseSearchDatacenterCluster" make e2e-test, but killed it before it created any backups.
Ran a backup with 1 node cluster.
Scaled the cluster to 3 nodes.
Started up an ubuntu pod, installed tlp-stress, loaded some data
Ran a few more backups
Created an index, tested a query.
Did one more backup.
Did a restore.
Confirmed a the data is back.
Rebuilt the search index, verified the search works again.
Did another backup, which failed. 1 node completed, 1 failed mid-way, 1 never started.
On the failing node, there was this in the medusa log:
# a lot of
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: minio-service.minio.svc.cluster.local
# then
ERROR:root:Error occurred during backup: Connection was closed before we received a valid response from endpoint URL: "http://minio-service.minio.svc.cluster.local:9000/k8ssandra-medusa/test/test-dc1-default-sts-2/data/tlp_stress/sensor_data-41ab3cd0b60f11ee8f9665415b594bf9/bb-244-bti-Data.db?uploadId=NmM2ZDlmOGYtYWZlYy00MzlhLThmMWMtYzE5NGNkMzAwMDBmLjU3MWFiMDg4LWFjMGMtNGRkZC1hOGJjLTc4YmFlMzdmMWNiMQ&partNumber=194".
[2024-01-18 15:22:03,162] ERROR: Error occurred during backup: Connection was closed before we received a valid response from endpoint URL: "http://minio-service.minio.svc.cluster.local:9000/k8ssandra-medusa/test/test-dc1-default-sts-2/data/tlp_stress/sensor_data-41ab3cd0b60f11ee8f9665415b594bf9/bb-244-bti-Data.db?uploadId=NmM2ZDlmOGYtYWZlYy00MzlhLThmMWMtYzE5NGNkMzAwMDBmLjU3MWFiMDg4LWFjMGMtNGRkZC1hOGJjLTc4YmFlMzdmMWNiMQ&partNumber=194".
Traceback (most recent call last):
File "/home/cassandra/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 670, in urlopen
httplib_response = self._make_request(
File "/home/cassandra/.local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 392, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/home/cassandra/.local/lib/python3.10/site-packages/botocore/awsrequest.py", line 96, in request
rval = super().request(method, url, body, headers, *args, **kwargs)
File "/usr/lib/python3.10/http/client.py", line 1283, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.10/http/client.py", line 1329, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.10/http/client.py", line 1278, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/cassandra/.local/lib/python3.10/site-packages/botocore/awsrequest.py", line 130, in _send_output
self._handle_expect_response(message_body)
File "/home/cassandra/.local/lib/python3.10/site-packages/botocore/awsrequest.py", line 176, in _handle_expect_response
self._send_message_body(message_body)
File "/home/cassandra/.local/lib/python3.10/site-packages/botocore/awsrequest.py", line 209, in _send_message_body
self.send(message_body)
File "/home/cassandra/.local/lib/python3.10/site-packages/botocore/awsrequest.py", line 223, in send
return super().send(str)
File "/usr/lib/python3.10/http/client.py", line 995, in send
self.sock.sendall(datablock)
ConnectionResetError: [Errno 104] Connection reset by peer
So it seems like a closed connection, but it's unclear where in Medusa we retry to handle this.
Doing another backup after this does not work. The operator reports a started backup job, but medusa-status does not recognise the backup.
Did you expect to see something different?
How to reproduce it (as minimally and precisely as possible):
Environment
K8ssandra Operator version:
Insert image tag or Git SHA here
* Kubernetes version information: `kubectl version`
* Kubernetes cluster kind:```
insert how you created your cluster: kops, bootkube, etc.
* Manifests:
insert manifests relevant to the issue
* K8ssandra Operator Logs:
insert K8ssandra Operator logs relevant to the issue here
**Anything else we need to know?**:
┆Issue is synchronized with this [Jira Story](https://datastax.jira.com/browse/K8OP-50) by [Unito](https://www.unito.io)
┆Issue Number: K8OP-50
The text was updated successfully, but these errors were encountered:
What happened?
During testing the DSE support, I ran into an issue were a backup of an already restored cluster does not happen. I did:
DefaultMedusaVersion = "c8609c8-tmp"
)make single-up ; E2E_TEST="TestOperator/CreateSingleDseSearchDatacenterCluster" make e2e-test
, but killed it before it created any backups.On the failing node, there was this in the medusa log:
So it seems like a closed connection, but it's unclear where in Medusa we retry to handle this.
Doing another backup after this does not work. The operator reports a started backup job, but medusa-status does not recognise the backup.
Did you expect to see something different?
How to reproduce it (as minimally and precisely as possible):
Environment
K8ssandra Operator version:
* Kubernetes version information: `kubectl version` * Kubernetes cluster kind:```Insert image tag or Git SHA here
insert how you created your cluster: kops, bootkube, etc.
insert manifests relevant to the issue
insert K8ssandra Operator logs relevant to the issue here
The text was updated successfully, but these errors were encountered: