You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When killQuery happens, a TCP/k8s loadbalancer may direct the connection to a node which doesn't run the to-be-killed query. KILL succeeds anyway, but wasn't effective.
We will update our configuration to list shard/nodes directly instead as a fix, trading complexity there.
To Reproduce
Declare a cluster with loadbalancer address, instead of replica list
Run & cancel a long running query
KILL runs on a different node than initial query, thus ineffective
Expected behavior
Could you consider selecting the kill targets with initial_query_id instead? It would improve the chance of cutting out resources consumption early.
OTOH, KILL QUERY ON CLUSTER {cluster} would require configuring/passing the "native" cluster name somewhere.
Environment information
For our production clusters, we supply applications with a keepalived-balanced endpoint. In chproxy config:
scheme: httpsnodes:
- lb-clickhouse.example:8443
Screenshots
DEBUG: 2024/05/16 18:14:56 proxy.go:84: [ Id: 17CF3EA10D4DDB62; User "u"(1) proxying as "p"(1) to "lb-clickhouse.example:8443"(1); RemoteAddr: "....
DEBUG: 2024/05/16 18:15:36 proxy.go:238: [ Id: 17CF3EA10D4DDB62; User "u"(1) proxying as "p"(1) to "lb-clickhouse.example:8443"(2); RemoteAddr: "..."; LocalAddr: "..."; Duration: 39435136 μs]: remote client closed the connection in 39.433581047s; query: "select ...
DEBUG: 2024/05/16 18:15:36 scope.go:256: killing the query with query_id=17CF3EA10D4DDB62
DEBUG: 2024/05/16 18:15:36 scope.go:296: killed the query with query_id=17CF3EA10D4DDB62; respBody: ""
DEBUG: 2024/05/16 18:15:36 proxy.go:156: [ Id: 17CF3EA10D4DDB62; User "u"(1) proxying as "p"(1) to "lb-clickhouse.example:8443"(2); RemoteAddr: "..."; LocalAddr: "..."; Duration: 39854873 μs]: request failure: non-200 status code 502; query: "select....FORMAT JSONCompact"; Method: POST; URL: "https://lb-clickhouse.example:8443/?max_execution_time=10800&max_memory_usage=42949672960&priority=4&query_id=17CF3EA10D4DDB62&result_overflow_mode=throw&session_timeout=60"
The KILL query ran at node ch3v, while the other nodes wasted time running the query to the end:
Hello,
Describe the bug
When
killQuery
happens, a TCP/k8s loadbalancer may direct the connection to a node which doesn't run the to-be-killed query.KILL
succeeds anyway, but wasn't effective.We will update our configuration to list shard/nodes directly instead as a fix, trading complexity there.
To Reproduce
KILL
runs on a different node than initial query, thus ineffectiveExpected behavior
Could you consider selecting the kill targets with
initial_query_id
instead? It would improve the chance of cutting out resources consumption early.OTOH,
KILL QUERY ON CLUSTER {cluster}
would require configuring/passing the "native" cluster name somewhere.Environment information
For our production clusters, we supply applications with a keepalived-balanced endpoint. In chproxy config:
Screenshots
The KILL query ran at node
ch3v
, while the other nodes wasted time running the query to the end:Environment information
chproxy v1.19.0, clickhouse 22.8
thank you
(sorry, I created issue from code line. I updated description according to BUG template)
The text was updated successfully, but these errors were encountered: