Skip to content
dulley edited this page Oct 5, 2011 · 12 revisions

Keep-Alive

Author: Lucas Peetz Dulley

State: Design

Overview

The goal of this feature is to instrument Collage LocalNodes to be aware of its connected remote notes status/round-trip-time(?)/lastAliveTime. With this information together the eq-timeouts the unresponsive clients can be dealt in a more conscious manner.

Requirements

  • Usage of keep-alive feature is optional and defaults to off

  • Keep-alive (ping) packet is sent when no reply packets, or data has been received for the node connection within a interval (half of the the global timeout?).

  • Applications can access keep-alive information of the client which is in an unresponsive or has times-out.

  • E.g.: EqPly can open heavy .ply files (e.g., lucy.ply) without failing due to frame rendering timeouts while the model is being distributed.

  • Whenever receiver thread receives data from a remote node, the lastAliveTime for the node is updated.

API

co::Node:

/** The node is responding (is alive) */
int64_t _lastAliveTime; //!< last time packets where received 	

int64_t co::Node::getLastAliveTime() // might be useful

virtual bool co::Node::isAlive() {} //pure virtual

co::LocalNode:

virtual bool isNodeAlive( NodePtr node ) {}

/** called from (ping/keepalive) thread */
bool co::LocalNode::sendPing( NodePtr remoteNode ); // sends a ping packet to remote node

/** process ping request. called from receiver thread (not queued in command queue) */
// updates lastAliveTime in receiver thread for the node which sent the packet
// sends a ping reply packet to "local" node
bool co::LocalNode::_cmdPing( Command& command );

/** process ping reply response. called from receiver thread (not queued in command queue) */
// updates lastAliveTime for the node which replied the packet
// remoteNode->_lastAliveTime = getTime64();
bool co::LocalNode::_cmdPingReply( Command& command ); 

/** variable indicating if keepalive signaling is on or off */
bool _keepAliveEnabled

Ping Packets:

/** NEW: node ping packet */
co::NodePingPacket
    // uint64_t transmitTime;
/** NEW: node ping reply packet */
co::NodePingReplyPacket( const NodePingPacket* request): transmitTime( request->transmitTime );
    // uint64_t transmitTime;

Usage

File Format

Keep-alive [ON|OFF] EQ-Config option?

Restrictions

This Collage-based keep-alive signal does take any action deciding whether a remote node is responsive or not. It only gathers and provides information about the actual remote nodes states from the local node perspective.

Issues

If round trip time (reception-transmit) is greater Global::getTimeout(), remote might be dead?

How to handle Inputframe/ timeout exceptions

Dealing with EQ timeout exceptions TIMEOUT_INPUTFRAME,TIMEOUT_FRAMESYNC eq\client\compositor.cpp eq\client\framedata.cpp eq\client\node.cpp

Through exception only if someone is considered unreachable. The application should be able to judge that.

Clone this wiki locally