-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry request if the generation cannot be aborted because of BLAS batch #53
Comments
The current BLAS batch cannot be aborted once it's started. It will abort once the current batch is done (e.g. 512/2048 will stop at 1024). Have you tried enabling |
Yes, I'm aware, and this is the exact reason why I posted this Issue – to be able to retry automatically when the impossible-to-abort batch is done.
Yes, but until that – Lite shows errors rather than schedule resending!
Wait, hold on. Even inside the BLAS, I can abort and regenerate without any visible errors, and the generation continues automatically!
Whoa, I found a drawback: if I change something at the beginning of the history and send it; then abort and send something again, abort and send, abort and send – then, kobolcpp finishes the batch and starts to generate something that was already aborted! Then it prints I suspect you either:
I think you should enforce a socket check in two moments:
In those two scenarios you should check is the user socket writable, and proceed only if the browser is still actually waiting for the result! Not after you spend time to generate something just to throw it off. I suggest doing this only in multiuser mode, since I see a valid use-case when somebody sends a large request and disconnects – expecting koboldcpp to honestly generate everything (at least to fill the history cache, and at most still printing the text to console). Are there other drawbacks of enabling the multiuser mode? Since if it will retry after aborted batch automatically (and if you can make it to cancel generations that nobody is waiting for) – it seems to be so good that I don't see reasons why to not use it always! |
There's no way to tell that the client has already abandoned the session without attempting to send data. In fact for synchronous calls there's no indication to the server when a client disconnects except an error when attempting to write to the connection. For SSE streaming this is easier to detect as the connection is actively used. But here, the issue is also with the Abort command - it's ignored for the users currently waiting in queue, as there is no queue to abort. Only the current active user has the ability to abort their connection. So here's a little illustration of the issue. UserA sends request 1. Request 1 started processing. In order for non-started requests to be aborted, a queue of "pending aborts" will need to be logged and tracked. This itself can present another series of problems. I think I can hack in a solution to keep track of the most recent "pending abort" that should work for a situation with 2 users. I really don't want to queue up aborts for more users than that. |
Added to my experimental branch, if you'd like to test. Note that it only works for up to 2 aborts - multiple aborts by queued users will only save the last attempt. |
Are you sure? Even for HTTP, you can just send the first line of response headers, which is always the same. |
It's just the basic python http.server, you pick a port and start a server that listens to it. You can see the current implementation in koboldcpp.py It's not raw TCP sockets. Events such as |
Python's http.server is exposing client socket object inside do_POST as There would be no additional data, and the .recv will block unless we set a zero timeout on the socket. Recapping, you can start your handler as def do_POST(self):
connection = self.connection Then the variable def is_connection_alive(connection):
connection.setblocking(False)
try:
return len(connection.recv(64)) > 0
except:
return True I tried wrapping your line print('ALIVE:',is_connection_alive(connection))
gen = asyncio.run(self.handle_request(genparams, api_format, sse_stream_flag))
print('ALIVE:',is_connection_alive(connection)) – And then in console I can clearly see whether the browser had already disconnected or not! So I suppose you should pass the correct |
This is not an ideal solution.
In any case, the solution to the "Queued Abort" has already been added - so if that solves this issue there is no need to go messing with the TCP connection states. Unless there is a proper READ ONLY way of determining the connection status? |
I certainly know. There is no more data, always. You are serving HTTP/1.0 responses, which by protocol must close TCP connection after sending only one response to the single request.
Okay, this will suffice, as per: https://stackoverflow.com/a/34371270
Well, you can say "What if I would use HTTP/1.1 with Keep-Alive?" and you will be right… def is_connection_alive(connection):
import select
r,w,e = select.select([connection],[connection],[],0)
return (len(r)==0) and (len(w)==1) ("select" is a core module: https://docs.python.org/3/library/select.html#select.select) Note that Here, if the client sends anything after POST body without waiting for a response – the connection would be wrongly identified as broken (since the socket becomes immediately readable). I think you are safe to use the above implementation (just move "import" to the top of your source code), I've tested it on Windows with all three modes of token streaming. |
Hmm noted. I will KIV this first. but in any case, the original issue is solved? |
I've just built and tested your current experimental branch (getting errors for 'tkinter' but drag-and-dropping .kcpp worked!) and at first I thought it is working as intended, aborting previous requests. But then I disabled SSE token streaming and smashed Generate More + [ABORT] several times during BLAS. The model did not stop its generation until "Amount to Gen." was reached, and then regenerated its response several times, sending me only the last one:
Wait, when you said "I will only track one previous user, not all of them" – I'd read that as "only the previous request can be aborted" and since I will ever use only two (the stalled in BLAS and the new one) – this will work. It comes down to: will it continue to generate when client already disconnected or not. Was it the case with original code also, when multiuser is in effect? If yes, then yours queue changes didn't add much to benefits of SSE. |
Yeah when I said 2 users, I meant 2 requests. |
At first I wanted to create a new Issue but then decided to just tell here: Can you NOT discard the partially fetched text with streaming/SSE when the connection breaks? I'm fine with shown error, but losing already generated text feels frustrating… |
Fixed |
Use case:
Submit
[ABORT]
link appears.[ABORT]
to cancel.Generate More
Generate More
(or, alternatively, I copy the last text back to the input box, hitBack
, then fix it and clickSubmit
again)Error Encountered
telling something aboutServer is busy; please try again later.
OK
Send Abort Command? Attempt to abort existing request?
Yes
orNo
, the BLAS step cannot be aborted.This is happening to me so often that now I came up with this idea:
Background generation in progress
with textThe koboldcpp server is busy generating its response to a previous request. You can try to abort it now.
and three buttons:Buttons
Abort
andCancel
are working as yoursYes
andNo
currently.But
Abort and retry
closes the popup and does this:[ABORT]
link under it.[ABORT]
while the real generation isn't started yet, the polling loop cancels, returning the UI to idle state just as this link is doing currently.Note that for point 5 you need to define, what should happen if the server responds that it is busy again: should we abort once more and restart polling, or error out?
If this "generally should not be possible", then the error is better (it would indicate that somebody else sent a request just before us)
On the other hand, if the polling is unreliable you can forcibly send abort+generate each time unless the user cancels or the server goes offline.
I don't know anything about Horde, I never used it and thus I cannot tell how it would affect it; but I assume you are either allowing aborting requests there (so the auto-repeat won't make it worse), or not.
What do you think?
The text was updated successfully, but these errors were encountered: