-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] V24.3.x rebase #142
base: v24.2.x
Are you sure you want to change the base?
[WIP] V24.3.x rebase #142
Conversation
There are several options what invoke_on_...() can try to invoke. So far only non-const and non-mutable callables are in use and it works. If calling it with mutable lambdas, it used to fail compilation, but previous patch fixed it, so here's the test. It also makes sense to support calling it on const sharded<> object (with the lambda accepting const service reference), but it's not that simple (see scylladb#2278) Signed-off-by: Pavel Emelyanov <[email protected]>
The method actualy shuts the socket down the hard way. There will appear a less disruptive call, so the existing one deserves some better name. Signed-off-by: Pavel Emelyanov <[email protected]>
When a socket is closed after noticing the EOF message in its shared buffer, it doesn't resolve the "shutdown" future. This is not very nice, because socket is indeed dead, and no other code would resolve it, so it's good to do it on close(). In fact, it is the peer who has to resolve this future, but it should happen only after the queue with messages is drained and it complicates the logic. Resolving it on close is good enough. In the non-EOF case the socket is aborted, which also resolve the aforementioned future. Signed-off-by: Pavel Emelyanov <[email protected]>
There's such a wrapper around loopback connection factory that fits it to http client needs. Several tests copy-and-pase it, new tests would appear soon, it's time to generalize it. Signed-off-by: Pavel Emelyanov <[email protected]>
It was added by 15942b9 (http/client: Count and limit the number of connections) but wasn't documented, so do it now. Signed-off-by: Pavel Emelyanov <[email protected]>
Currently consuming http response doesn't check if parser failed reading the response check. Add it. Also, when parser sees correct, but incomplete response it sets itself to .eof(), not .failed(). This case should be told from invalid, so the system error with ECONABORTED status is thrown instead. Signed-off-by: Pavel Emelyanov <[email protected]>
This needs splitting connection::get_connection() into get_ and make_connection. Both to facilitate next patching Signed-off-by: Pavel Emelyanov <[email protected]>
Currently request is passed by value all the way down to connection where it's moved to stable memory with do_with(std::move(req), ...) and then processing continues. Next patches are going to teach client restart requests over different connections, so the request should be moved into do_with-context at the client side. While at it -- keep the response handler in the stable context as well, next patching will appreciate this change. Signed-off-by: Pavel Emelyanov <[email protected]>
Signed-off-by: Pavel Emelyanov <[email protected]>
Currently if client::make_request() fails, the error is propagated back to the caller whatever the error is. Sometimes the request fails because of connection glitch, which can happen and in that case it's common to try the same request again after a while in a hope that it was indeed some teporary glitch. Signed-off-by: Pavel Emelyanov <[email protected]>
…ue' from Patryk Wróbel By default io_tester uses the size of a file as the allocation extent size hint. However, certain tests may want to specify some exact value for that parameter e.g. to match the extent size hint configured for a specific product. This change adds a new parameter to YAML job description that enables users to explicitly set some specific value for extent size hint. Closes scylladb#2280 * https://github.com/scylladb/seastar: io-tester.md: update available parameters for job description io_tester: expose extent_allocation_size_hint via job param
simpler this way. Signed-off-by: Kefu Chai <[email protected]> Closes scylladb#2281
c-ares marked some APIs deprecated in 1.28.1. in this change, we conditionally use the undeprecated APIs when they are available. please note, we don't specify the minimal supported c-ares version in our building system. in which, ares_fds() and ares_process() are not changed yet, because we need to change the way how to poll the events for name resolution. this would need more thoughts before moving forward. Refs scylladb#2197 Signed-off-by: Kefu Chai <[email protected]> Closes scylladb#2200
reduce image size reduce layers amount format combine apt-get update with apt-get install in the same RUN statement use arguments JSON notation for CMD: https://github.com/hadolint/hadolint/wiki/DL3025\#rationale
it reduces the layer's size. also we do not need this file in running container
it makes maintenance easier and helps to avoid duplication of packages and make the list much easier to update
There's an issue with maintaining keep-alive sockets by http client. Some prior art first. The HTTP client uses seastar connections to send requests over and receive responses back. Typically the connection is a TCP socket (or TLS connection which is a wrapper over TCP socket too). Once request-response cycle is over the connection used for that is kept by a client in a "pool". Making another http request may then pick up the existing connection from the pool thus avoiding the extra latency of establishing new connection. Pool may thus accumulate more than one connection if the client user sends several requests in parallel. ```mermaid flowchart TD A[ Start ] --> B{ pool is not empty } B -->| y | C[ pop connection from pool ] B -->| n | D[ establish new connection ] C --> E[ talk to server ] D --> E E --> F{ got error? } F --> | n | G[ push connection to pool ] F --> | y | H[ Done ] G --> H ``` HTTP servers may sometimes want to terminate the connections it keeps. This can happen in one of several ways. The "gentle" way is when server adds the "connection: close" header to its response. In that case client would handle the response and will not put the connection back to pool, but instead would just close it. So next request would either pick some other connection from pool or would need to establish a new one. Less gentle way a server may terminate a connection is by closing it, so the underlying TCP stack would communicate regular TCP FIN-s. On the client side there's connected_socket::wait_input_shutdown() method that returns a future that gets resolved when kernel terminates the connection by peers request. In case client's connection is kept in pool it will be closed and removed from the pool behind the scenes. Next request won't even notice that -- it will either pick some other connection from pool, or will establish a new one. Sometimes more unpleasant situation happens. It can be either a race or deliberate server's "abort". In the former case, server closes the connection and TCP starts communicating FIN-s in parallel with client picking up a connection for its new request. In that case, even if kernel resolves "input-shutdown" event described above, due to seastar event loop and batch flusher, client would see that the connection is closed _after_ it had picked it from pool and had chance to put some data into it. In the latter case server closes the connection in the middle of reading the request from the client or even in the middle of writing back the response. This is very unlikely, but still happens from time to time. Having said the above, the problem -- when user calls `client::make_request()` and client steps on the "unpleasant" server-side connection closure, the request making resolves with exception and user has to do something about it. There are two options here -- handle the exception somehow or ask client to make the same request again (restart the request). This PR suggest that the latter choice can be implemented in the HTTP client code. If user doesn't want to restart, it may ask client not to do it, the new API allows for that. First (a side node) -- to tell the "server closed its socket" from other errors the exception from scoket IO is checked to be the system error with EPIPE or ECONABORTED code in it. EPIPE is reported when it comes from writing the request, ECONABORTED is reported when it comes from reading the response. Next, there's only one way to replay the request -- client should get new connection somehow and repeat the request-response cycle. There are two options where to get new connection from -- from pool (if it's there) or establish a new one. While experimenting with HTTP client I noticed that picking up connection from pool often results in several "transport exception"-s in a row, as if server was closing connections in batches. So this PR makes a shortcut for restarts it always establishes new connection to server (which, after request-response cycle, is put back into pool normally). ```mermaid flowchart TD A[ Start ] --> B{ pool is not empty } B -->| y | C[ pop connection from pool ] B -->| n | D[ establish new connection ] C --> E[ talk to server ] D --> E E --> F{ got error? } F --> | n | G[ push connection to pool ] F --> | y | J{ should restart } J --> | n | H[ Done ] J --> | y | D G --> H ``` And the last, but not least, restart only happens once, because the chance of observing "transport exception" from newly established connection is considered to be low enough not to care. Respectively, if a request is made over new connection (not over a connection from pool) and "transport exception" pops up it's not restarted. Closes scylladb#1883 * github.com:scylladb/seastar: http/client: Retry request over fresh connection in case old one failed http/client: Fix indentation after previous patch http/client: Pass request and handle by reference http/client: Introduce make_new_connection() http/client: Fix parser result checking http/client: Document max_connections test/http: Generalize http connection factory loopback_socket: Shutdown socket on EOF close loopback_socket: Rename buffer's shutdown() to abort()
for better readability Signed-off-by: Kefu Chai <[email protected]>
`count` is never used after being assigned, so drop it. Signed-off-by: Kefu Chai <[email protected]>
lint dockerfile ([linter](https://hadolint.github.io/hadolint/)): - reduce image size - reduce layers amount - prevent caching issues related to apt-get commands usage - use arguments JSON notation for CMD: [rationale](https://github.com/hadolint/hadolint/wiki/DL3025#rationale) sort packages alphanumerically to make maintenance easier Closes scylladb#2283 * https://github.com/scylladb/seastar: scripts: sort packages alphanumerically docker: bind the file instead of copying during the build stage docker: lint dockerfile
…fu Chai in this series, we move all tests in `test_metrics.py` into `prometheus_test.py ` to drop the repeating code. Closes scylladb#2254 * https://github.com/scylladb/seastar: tests: test protobuf support in prometheus_test.py tests: enable prometheus_test.py to test metrics without aggregation
in 6de08b0, we introduced the dependency to pyyaml, but we failed to update `install-dependencies.sh` to install it. in this change, we add this dependency to `install-dependencies.sh`. Refs 6de08b0 Signed-off-by: Kefu Chai <[email protected]>
before this change, we do not verify that `stream` produces the consumed values. after this change, we check if the subscriber of `stream` gets the value sent by the writer side. Signed-off-by: Kefu Chai <[email protected]>
`write(output_stream<char>&, const jsonable&)` would call `jsonable::to_json` which would just serialize the whole object (and its children) to a string and then dump that onto the `output_stream`. Instead call `jsonable::write(output_stream<char>&)` which will recursively call `write` (for the `jsonbase` subclasses) and hence propagate the benefits of writing to the output stream directly. To make sure that the object is alive for the duration of the call we change the overload to copy its argument.
This is to avoid potential lifetime issues as these overloads might yield internally and hence the input needs to be kept alive for the duration of that.
Adds a basic json formatter test for jsonable objects and one for streaming them via `stream_range_as_array`. The latter is also a reproducer for the stack-use-after-return.
I was actually surprised to see that `then()` no longer does a preemption check (and this certainly leads to better code generation!). I checked the tutorial and notice that it refers a different mechanism entirely, a 256-continuation counter but I can't find any trace of this in the code, so I assume it preceeded the time-based preemption check which has subsequently been removed. So: - Rip out mention of the 256-counter - Talk about preemption which isn't really covered in the tutorial and add that then() is not a suspension point (but co_await is, by default)
… Travis Downs I was actually surprised to see that `then()` no longer does a preemption check (and this certainly leads to better code generation!). I checked the tutorial and notice that it refers a different mechanism entirely, a 256-continuation counter but I can't find any trace of this in the code, so I assume it preceeded the time-based preemption check which has subsequently been removed. So: - Rip out mention of the 256-counter - Talk about preemption which isn't really covered in the tutorial and add that then() is not a suspension point (but co_await is, by default) Closes scylladb#2292 * github.com:scylladb/seastar: tutorial.md: fix typos Update tutorial.md to reflect update preemption methods tutorial.md: remove trailing whitespace
GnuTLS permitted a user to provide a chain of certificates in the certificate file passed to set_x509_key_file rather than set_x509_trust_file. This commit adjusts how the OpenSSL implementation parses the provided certificate file so it can also accept a chain of certs in the certificate file parameter to set_x509_key_file. Signed-off-by: Michael Boquard <[email protected]>
Once the shutdown alert is sent, if anymore data is received on the SSL session, the SSL_shutdown method will return SSL_R_APPLICATION_DATA_AFTER_CLOSE_NOTIFY error code. This may be due to a race condition where data is received immediately after the close notify is sent. In this situation, we will simply ignore the error and continue shutdown. Added the drain_openssl_error_stack method that will purposefully empty out OpenSSL's error stack in situations where errors can be safely ignored. Signed-off-by: Michael Boquard <[email protected]>
The prometheus text protocol requires that the label value to be escaped. Note that this is only needed for the test protocol and not for the protobuf-based prometheus protocol. > label_value can be any sequence of UTF-8 characters, but the backslash > (\), double-quote ("), and line feed (\n) characters have to be > escaped as \\, \", and \n, respectively. From https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md
Avoid a string copy when writing the metric value and the metric type to the prometheus text-based response.
Adds a test to verify that the metrics returned with the prometheus text format are formatted as expected.
Fixes up an earlier commit which should have been a refactor commit but had an unintended change to the way `double`'s were formatted on the prometheus response. Note that now `fmt::print` is used instead of the originally used `std::to_string`. This is because `std::to_string` is going to change its formatting behaviour in C++26, so we prefer to use a more deterministic formatter here. Fixes up 864c209
Fixes scylladb#2154 TLS1.3 session tickets are a client-storage solution for session resume, bypassing some of the more expensive parts of a normal handshake. Implemented by adding an optional session data parameter to tls_options, options to certificates for server keys, and query functions for state/session data on the client side. Note that the feature requires both client and server to handle tls1.3. Server session keys are kept in the certificates object, and can be rotated by simply re-enabling the feature on the object. Reloadable certificates will rotate keys on cert reload. Simple test included. Redpanda note: Chery-picked from upstream c224fe0. Added enough code to make OpenSSL implementation build (test will fail). Signed-off-by: Michael Boquard <[email protected]>
`future::get0()` was deprecated. so let's use `future::get()` instead. this addresses following build failure: ``` FAILED: tests/unit/CMakeFiles/test_unit_tls.dir/tls_test.cc.o /usr/bin/g++-13 -DBOOST_ALL_DYN_LINK -DBOOST_NO_CXX98_FUNCTION_BASE -DFMT_SHARED -DSEASTAR_API_LEVEL=7 -DSEASTAR_BUILD_SHARED_LIBS -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_DEFERRED_ACTION_REQUIRE_NOEXCEPT -DSEASTAR_HAS_MEMBARRIER -DSEASTAR_HAVE_ASAN_FIBER_SUPPORT -DSEASTAR_HAVE_HWLOC -DSEASTAR_HAVE_NUMA -DSEASTAR_HAVE_SYSTEMTAP_SDT -DSEASTAR_HAVE_URING -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TESTING_MAIN -DSEASTAR_TESTING_WITH_NETWORKING=1 -DSEASTAR_TYPE_ERASE_MORE -I/home/circleci/project/tests/unit -I/home/circleci/project/src -I/home/circleci/project/include -I/home/circleci/project/build/debug/gen/include -I/home/circleci/project/build/debug/gen/src -g -std=gnu++20 -U_FORTIFY_SOURCE -Wno-maybe-uninitialized -Wno-error=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -UNDEBUG -Wall -Werror -Wimplicit-fallthrough -Wdeprecated -Wno-error=deprecated -Wno-error=stringop-overflow -Wno-error=array-bounds -gz -MD -MT tests/unit/CMakeFiles/test_unit_tls.dir/tls_test.cc.o -MF tests/unit/CMakeFiles/test_unit_tls.dir/tls_test.cc.o.d -o tests/unit/CMakeFiles/test_unit_tls.dir/tls_test.cc.o -c /home/circleci/project/tests/unit/tls_test.cc In file included from /usr/include/boost/test/test_tools.hpp:45, from /usr/include/boost/test/unit_test.hpp:18, from /home/circleci/project/include/seastar/testing/seastar_test.hh:27, from /home/circleci/project/include/seastar/testing/test_case.hh:31, from /home/circleci/project/tests/unit/tls_test.cc:40: /home/circleci/project/tests/unit/tls_test.cc: In member function 'void test_tls13_session_tickets::do_run_test_case() const': /home/circleci/project/tests/unit/tls_test.cc:1541:61: error: 'seastar::future<T>::get0_return_type seastar::future<T>::get0() [with T = bool; get0_return_type = bool]' is deprecated: Use get() instead [-Werror=deprecated-declarations] 1541 | BOOST_REQUIRE(!tls::check_session_is_resumed(c).get0()); // no resume data | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ In file included from /home/circleci/project/include/seastar/core/do_with.hh:25, from /home/circleci/project/tests/unit/tls_test.cc:25: /home/circleci/project/include/seastar/core/future.hh:1376:22: note: declared here 1376 | get0_return_type get0() { | ^~~~ /home/circleci/project/tests/unit/tls_test.cc:1544:57: error: 'seastar::future<T>::get0_return_type seastar::future<T>::get0() [with T = std::vector<unsigned char>; get0_return_type = std::vector<unsigned char>]' is deprecated: Use get() instead [-Werror=deprecated-declarations] 1544 | sess_data = tls::get_session_resume_data(c).get0(); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ /home/circleci/project/include/seastar/core/future.hh:1376:22: note: declared here 1376 | get0_return_type get0() { | ^~~~ /home/circleci/project/tests/unit/tls_test.cc:1584:29: error: 'seastar::future<T>::get0_return_type seastar::future<T>::get0() [with T = bool; get0_return_type = bool]' is deprecated: Use get() instead [-Werror=deprecated-declarations] 1584 | BOOST_REQUIRE(f.get0()); // Should work | ~~~~~~^~ /home/circleci/project/include/seastar/core/future.hh:1376:22: note: declared here 1376 | get0_return_type get0() { | ^~~~ cc1plus: all warnings being treated as errors ``` Refs c224fe0 Signed-off-by: Kefu Chai <[email protected]>
Signed-off-by: Michael Boquard <[email protected]>
The tls session ticket contains sensitive data which needs to be cleared before it is released with free. This is documented in the man page of `gnutls_session_ticket_enable_server`: ``` Request that the server should attempt session resumption using session tickets, i.e., by delegating storage to the client. key must be initialized using gnutls_session_ticket_key_generate(). To avoid leaking that key, use gnutls_memset() prior to releasing it. ``` (cherry picked from commit 5e9b43a)
@michael-redpanda @pgellert If you folks could take a close look at the final TLS code and see if it's correct I'd greatly appreciate it. Also can one of you take a look at the TLS tests and see if the following failure is expected?
|
I figured out the test issue. In #ifndef SEASTAR_WITH_TLS_OSSL
#include <gnutls/gnutls.h>
#endif around where the |
@michael-redpanda Nice! It looks like the include is missing in the |
It should be fine on our v24.2.x branch, this change should only effect the GnuTLS build not the OpenSSL build |
Note that adding the header resulted in a different TLS test failure. @michael-redpanda and I decided to not get gnutls working for this branch as we don't use it in RP. @michael-redpanda has a separate effort to get gnutls working in a PR he'll be upstreaming. After compiling seastar with openSSL the following unit test error occurred;
@michael-redpanda is looking into this. |
@michael-redpanda was able to figure out the issues and I've pushed a change with his recommended fixes. The last unit test failure I'll be looking into is the following;
|
The websocket failures were from issues with the openSSL SHA1 base64 encoder when compiling with GCC. Normally the sha
however, when compiling with openssl and gcc we get;
with openssl and llvm we get the expected encoding though. So since this isn't an issue with llvm, which is the only compiler we use for Redpanda, I'll be ignoring the test failures in GCC and continuing the rebase testing with LLVM only. |
Can you explain? This looks like a UB/memory-safety issue to me. GCC and clang shouldn't shouldn't compute this differently. Are we passing some uninitialized buffer somewhere (in the code or test)?
I am not sure we should ignore this given the severity of the issue. Also we want to upstream this anyway and upstream definitely cares about gcc. |
This effort is just applying our current changes onto master for our use. Separate effort underway to upstream. Plus we don't use websockets (yet?) |
Signed-off-by: Ben Pope <[email protected]>
Note that this PR is only open for the purpose of letting folks review the rebase before I push it to a dedicated
v24.3.x
branch. It will not be merged into thev24.2.x
branch.