You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The pooled connections found in the mod_WebObjects adaptor code contain a race condition that can cause requests to fail and applications to incorrectly be marked as unavailable. It's possible the bug applies to other adaptors as well, but I don't have test environments available.
The problem occurs when a client terminates a connection before retrieving the entire response, and is most evident when the response is very large. SendResponse in mod_WebObjects.c detects the closed connection by testing r->connection->aborted after sending each block of the response. If true, the loop is terminated -- without reading the remaining data from the application socket -- and control is returned to WebObjects_handler which then frees the request and returns the socket to the connection pool.
When the socket is later reused in tr_open, an attempt is made to drain the socket by calling its reset method. The reset function in nbsocket.c is flawed and does not properly drain the socket. It uses non-blocking I/O to read data from the socket until the local buffers are empty, but this does not guarantee that the remote end does not have queued data it has not had a chance to send yet. By this time, there is no way for the receiving end to know how much data might be pending since the response length was discarded when the response was freed previously. Thus a race condition exists where success depends on the remote end sending data fast enough to keep the local socket buffer full until the entire transaction is read.
If a response is very large, attempting to flush an aborted transaction by reading and discarding the entire response is very inefficient. I suggest instead discarding the socket if the client connection is terminated before the response is transmitted. This can easily be implemented, along with a log statement, by changing the test in SendResponse in mod_WebObjects.c:
if (r->connection->aborted) {
WOLog(WO_DBG, "Connection aborted before all data was sent: %s", r->uri);
resp->keepConnection = 0; // Toss the connection if we aren't able to finish reading it out.
break;
}
Alternatively, since a length count is available, one could also finish reading the response before returning, but this isn't always the most efficient approach.
Also, since the "reset" function really does not successfully flush the connection, I suggest changing it to report an error if it finds data on the connection. As I pointed out, it's not possible for it to absolutely confirm that the connection is empty, but it can act as an additional check for connections that aren't in the expected state when they are reused. The comments in transport.c around line 144 suggest this was the expected behavior of reset at one time. I have made this change locally by adding a value to the TR_STATUS enum called TR_BUFFER_NOT_EMPTY, and modifying the reset function in nbsocket.c as follows:
/* drain any data from the socket */
result = nonBlockingRecv(appfd, 0, appfd->buf, NETBUFSZ);
if (result > 0) {
WOLog(WO_ERR, "reset(): leftover data in socket buffer");
appfd->status = TR_BUFFER_NOT_EMPTY;
}
// Only allowable status is a timeout on the read
if (result == -1 && appfd->status == TR_TIMEOUT) {
appfd->status = TR_OK;
}
/* clear our buffer */
if (appfd->status == TR_OK) {
if (appfd->count != 0) {
appfd->status = TR_BUFFER_NOT_EMPTY;
WOLog(WO_ERR, "reset(): leftover data in buffer");
}
The pooled connections found in the mod_WebObjects adaptor code contain a race condition that can cause requests to fail and applications to incorrectly be marked as unavailable. It's possible the bug applies to other adaptors as well, but I don't have test environments available.
The problem occurs when a client terminates a connection before retrieving the entire response, and is most evident when the response is very large. SendResponse in mod_WebObjects.c detects the closed connection by testing r->connection->aborted after sending each block of the response. If true, the loop is terminated -- without reading the remaining data from the application socket -- and control is returned to WebObjects_handler which then frees the request and returns the socket to the connection pool.
When the socket is later reused in tr_open, an attempt is made to drain the socket by calling its reset method. The reset function in nbsocket.c is flawed and does not properly drain the socket. It uses non-blocking I/O to read data from the socket until the local buffers are empty, but this does not guarantee that the remote end does not have queued data it has not had a chance to send yet. By this time, there is no way for the receiving end to know how much data might be pending since the response length was discarded when the response was freed previously. Thus a race condition exists where success depends on the remote end sending data fast enough to keep the local socket buffer full until the entire transaction is read.
If a response is very large, attempting to flush an aborted transaction by reading and discarding the entire response is very inefficient. I suggest instead discarding the socket if the client connection is terminated before the response is transmitted. This can easily be implemented, along with a log statement, by changing the test in SendResponse in mod_WebObjects.c:
if (r->connection->aborted) {
WOLog(WO_DBG, "Connection aborted before all data was sent: %s", r->uri);
resp->keepConnection = 0; // Toss the connection if we aren't able to finish reading it out.
break;
}
Alternatively, since a length count is available, one could also finish reading the response before returning, but this isn't always the most efficient approach.
Also, since the "reset" function really does not successfully flush the connection, I suggest changing it to report an error if it finds data on the connection. As I pointed out, it's not possible for it to absolutely confirm that the connection is empty, but it can act as an additional check for connections that aren't in the expected state when they are reused. The comments in transport.c around line 144 suggest this was the expected behavior of reset at one time. I have made this change locally by adding a value to the TR_STATUS enum called TR_BUFFER_NOT_EMPTY, and modifying the reset function in nbsocket.c as follows:
static TR_STATUS reset(net_fd appfd) {
int result, warned = 0, n;
struct timeval tv;
fd_set wset;
if (appfd->status != TR_OK) {
return TR_ERROR;
}
/* drain any data from the socket */
result = nonBlockingRecv(appfd, 0, appfd->buf, NETBUFSZ);
if (result > 0) {
WOLog(WO_ERR, "reset(): leftover data in socket buffer");
appfd->status = TR_BUFFER_NOT_EMPTY;
}
// Only allowable status is a timeout on the read
if (result == -1 && appfd->status == TR_TIMEOUT) {
appfd->status = TR_OK;
}
/* clear our buffer */
if (appfd->status == TR_OK) {
if (appfd->count != 0) {
appfd->status = TR_BUFFER_NOT_EMPTY;
WOLog(WO_ERR, "reset(): leftover data in buffer");
}
}
return appfd->status;
}
If these changes are acceptable, please feel free to incorporate them into the Wonder sources.
The text was updated successfully, but these errors were encountered: