From 91b3c7e713ee32737effe0907edb049c50cadc7b Mon Sep 17 00:00:00 2001 From: Igor Gaponenko Date: Wed, 18 Dec 2024 11:31:48 -0800 Subject: [PATCH] Documentation on the http-based Worker Replication service --- doc/dev/api/index.rst | 16 + doc/dev/api/introduction.rst | 25 + doc/dev/api/repl-worker.rst | 597 ++++++++++++++++++++++ doc/dev/index.rst | 1 + doc/ingest/api/reference/rest/general.rst | 2 + 5 files changed, 641 insertions(+) create mode 100644 doc/dev/api/index.rst create mode 100644 doc/dev/api/introduction.rst create mode 100644 doc/dev/api/repl-worker.rst diff --git a/doc/dev/api/index.rst b/doc/dev/api/index.rst new file mode 100644 index 000000000..aef592413 --- /dev/null +++ b/doc/dev/api/index.rst @@ -0,0 +1,16 @@ +.. note:: + + Information in this guide corresponds to the version **40** of the Qserv REST API. Keep in mind + that each implementation of the API has a specific version. The version number will change + if any changes to the implementation or the API that might affect users will be made. + The current document will be kept updated to reflect the latest version of the API. + +############################## +The internal REST API of Qserv +############################## + +.. toctree:: + :maxdepth: 2 + + introduction + repl-worker diff --git a/doc/dev/api/introduction.rst b/doc/dev/api/introduction.rst new file mode 100644 index 000000000..e1ed637e3 --- /dev/null +++ b/doc/dev/api/introduction.rst @@ -0,0 +1,25 @@ +.. _qserv-api-introduction: + +Introduction +============ + +This document presents a collection of the internal REST APIs that exist in the Qserv implementation. +Qserv developers may use these APIs to interact with various components of Qserv. + +The Qserv REST API is a collection of RESTful web services that provide access to various components of the Qserv system. +The API enforeces a specific interaction model between the client and the server. The following hihglites are worth mentioning: + +- All ``POST``, ``PUT`` and ``DELETE`` requests must be accompanied by a JSON payload. +- Responses of all but a few select services are in JSON format.Exceptions are documented in the API documentation. +- Schemas of the JSON requests and payloads are defined in the API documentation. +- The API is versioned. The version number is included in the URL path of the ``GET`` requests, and it's + inluded into the JSON payload of the ``POST``, ``PUT`` and ``DELETE`` requests. +- All API services are protected by an authentication mechanism. The client must provide a valid + authentication token in the JSON payload of the the ``POST``, ``PUT`` and ``DELETE`` requests. + No authentication is required for the ``GET`` requests. + +The general information on the structure of the API can be found in the following document: + +- :ref:`ingest-general` + +The rest of the current document provides detailed information on the individual services that are available in the Qserv API. diff --git a/doc/dev/api/repl-worker.rst b/doc/dev/api/repl-worker.rst new file mode 100644 index 000000000..657cc567f --- /dev/null +++ b/doc/dev/api/repl-worker.rst @@ -0,0 +1,597 @@ +.. _qserv-api-repl-worker: + +Replication Worker Services +=========================== + +Scope +----- + +This document describes the implementation of the Controller - Replication worker protocol based on HTTP/JSON. + + +TODO +---- + + +Finish in a scope of the current ticket DM-42005 before the X-Mas break: + +- [x] Think about the locking mechanism of the method WorkerHttpRequest::toJson(). The nethod + acquires a lock on the mutext while the request may too have a lock on the same mutex + while processing the request in WorkerHttpRequest::execute(). This may result in a deadlock. + Perhaps no locking is needed as all since the resulting data are not lock sencitive? +- [x] Finish implementing a hierachy of the HTTP-based worker requests +- [x] Finish implementing the request processor for these requests +- [x] Add the new service to the Condfiguration and Registry to allow the Controller to send requests + to the worker via HTTP +- [x] Display connection parameters of the new service on the Web Dashboard +- [ ] Document the REST services in the documentation tree. +- [ ] Manually test the new implementation externally using ``curl`` or Python's ``requests`` module. + Think about the test cases to cover the new implementation. +- [ ] Extend the integration tests to cover the new implementation. + +Finish in a scope of a separate ticket during/after the X-Mas break: + +- [ ] Implement the MessengerHttp on the Controller side of the protocol. The class will + be providing the multiplexing API for the Controller to send requests to the worker. + The initial implementation will be based on the simple http::AsyncReq. +- [ ] Create a parallel hierarchy of the HTTP-based request & job classes on the Controller + side of the protocol. +- [ ] Test the new classes. +- [ ] Implement the MessengerHttp to reuse the socket connections for sending multiple requests + to the same worker. +- [ ] Test the new implementation to ensure it works the same way as the old one. +- [ ] Remove the old implementation of the Controller - Worker protocol. + + +Request categories +------------------ + +- Echo request +- Replica management +- Database management (SQL) +- Requests management +- Worker service management + + +General considerations +---------------------- + +Common attributes sent with all request types +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- ``instance_id``: // The unique identifier of a Qserv instance served by the Replication System. +- ``version``: // The version of the protocol used by the Controller. + +Values of these atributes are send in the request body for the ``POST`` and ``PUT`` requests. +For ``GET`` requests the values are sent in the URL. + + +Attributes in ``POST`` and ``PUT``` requests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +- ``auth_key``: // The authentication key to access the service. + +A value of this attribute is sent in the request body for the ``POST`` and ``PUT`` requests. + + +Schema of the request object for all requests send as ``POST`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The request body is a JSON object with the following **required** fields: + +.. code-block:: + + { + "id": , // The unique ID of the request (generated by the Controller) + + // Optional request expiration timeout (for queued requests only) for disposing + // requests regardless of their statuses. The timeout is expressed in + // seconds. The default value of '0' means no expiration for a request. + "timeout": , // [=0] + + // The optional priority level (for queued requests only) may affect a position of + // a request relative to others in the input queue of the worker processor. + // Requests with higher priority levels are placed close to the head + // of the queue. + "priority": , // [=0] + + "req": { + ... + } + } + +The first 3 attribiutes are captured in the C++ code into the following structure (defined in ``replica/proto/Protocol.h``): + +.. code-block::c++ + + struct QueuedRequestHdr { + std::string id; + int priority; + unsigned int timeout; + QueuedRequestHdr(std::string const& id_, int priority_, unsigned int timeout_) + : id(id_), priority(priority_), timeout(timeout_) {} + nlohmann::json toJson() const { + return {{"id", id}, {"priority", priority}, {"timeout", timeout}}; + }; + }; + +The schema of the last attribute ``req`` is request-specific and is documented in the relevant sections below. + +Schema of the response object for all but the service management and dispose requests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The schema applis to the following request types: + +- Echo +- Create replica +- Delete replica +- Find replica +- Find all replicas +- Director index requests +- SQL requests +- Track request +- Get request status +- Stop request + +All responses have the following schema: + +.. code-block:: + + { + // The usual attributes returned by all REST services. Normally the "success" attribute is set to 1 + // once the input parameters of the request are parsed. Any request-specific errors are returned in + // the "response" attribute of the JSON object. + "success": , + "error": , + "error_ext": , + + "req": , // The original request object as it was received by the worker + "type": , // The type of the request + + "status": , // The completion status of the operation. Values corresponds to protocol::Status + "status_ext": // Extended status of this operation. Values corresponds to protocol::StatusExt [=NONE] + + "expiration_timeout_sec": , // The effective expiration timeout of the request in seconds + + "performance": { + "receive_time": , // When a request was received by a worker service + "start_time": , // When request execution started by a worker service + "finish_time": // When request execution finished by a worker service + } + "result": { + ... + } + } + +The payload of the ``result`` object depends on the type of the request. Also note that: + +- The object is empty for all newely submitted requests that ended up in the processing queue. +- The object is filled with the relevant data for the requests that have been processed. + + +Response objects of the service management requests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +TBC... + + +Echo request +------------ + +The Controller sends a POST request to the Replication worker to test the functionality of the worker processor +and to simulate the request submittion/processing path. The ``echo`` request has no side effects, such as changes +to the worker databases. Parameters of the request will be evaluated by the service. If all looks +okay then the request will be queued for processing. Otherwise, the service will return an error. + +.. code-block: + + POST /worker/echo + +These are the the request-specific attributes: + +.. code-block:: + + { + "req": { + "delay" : , // The delay in milliseconds before the response is sent back + "data" : // The data to be echoed back + } + } + +Once the request is submitted and the worker service indicated that the request looked good, the state +of the request it can be further managed via: + +- TODO: link to the request tracking service +- TODO: link to the replica status service +- TODO: link to the replica cancel service + +The response object for the successfully completed request has the following attributes: + +.. code-block:: + + { + "result": { + "data" : // The data that was echoed back + } + } + + +Replica management/information requests +--------------------------------------- + +All requests of this category are queued and processed by a dedicated pool of the worker threads. +Once the request is submitted and the worker service indicated that the request looked good, the state +of the request it can be further managed via: + +- TODO: link to the request tracking service +- TODO: link to the replica status service +- TODO: link to the replica cancel service + +Schemas of the response object for the successfully completed request varies depending on a type of a request. + +Schemas for the single replica requests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The object has the following attributes: + +.. code-block:: + + { + "result" : { + "replica_info": { + // enum ReplicaStatus { + // NOT_FOUND = 0; + // CORRUPT = 1; + // INCOMPLETE = 2; + // COMPLETE = 3; + // } + "status": , // The status of the replica. Values corresponds to enums in class "ReplicaStatus" + "worker": , // The worker ID + "database": , + "chunk": , + + // A collection of files + // + "file_info_many": [ + + { + "name": , // The name of a file + "size": , // Size in bytes + "cs": , // Control sum (if available) + "mtime": , // The file content modification time in seconds (since UNIX Epoch) + + // The following parameters are set in the relevant contexts only. + // Otherwise they'll be set to some default value. + + "begin_transfer_time": , // When the file migration started (where applies) [=0] + "end_transfer_time": , // When the file migration finished (where applies) [=0] + "in_size": // The size of an input file (where applies) [=0] + }, + ], + "verify_time": // When the replica status was verified by a worker + } + } + } + +Schemas for the multi-replica requests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. note:: + + Presently, the only multi-replica request in tis category is the ``find-all`` request. + +The response object has the following attributes: + +.. code-block:: + + { + "result" : { + "replica_info_many": [ + ... + ] + } + } + +Where each array entry is an object that has a single replica schema (``replica_info``) as described above for +the signle-replica requests. + + +Create a new chunk replica +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The Controller sends a POST request to the Replication worker to initiate the replica replica creation +operation on the target worker. Parameters of the request will be evaluated by the service. If all looks +okay then the request will be queued for processing. Otherwise, the service will return an error. + +.. code-block: + + POST /worker/replica/create + +The request-specific attributes: + +.. code-block:: + + { + "req": { + "database": , + "chunk": , + "src_worker": , // The source worker ID from where to pull the replica + "src_worker_host": , // The source worker host (DNS or IP) + "src_worker_port": // The source worker port + } + } + +Delete an existing chunk replica +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The Controller sends a POST request to the Replication worker to initiate the replica deletion +operation on the target worker. Parameters of the request will be evaluated by the service. If all looks +okay then the request will be queued for processing. Otherwise, the service will return an error. + +.. code-block: + + POST /worker/replica/delete + +The request-specific attributes: + +.. code-block:: + + { + "req": { + "database" : , + "chunk" : + } + } + +Find info an existing chunk replica +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The Controller sends a POST request to the Replication worker to locate and report a status of a single chunk replica: + +.. code-block: + + POST /worker/replica/find + +The request-specific attributes: + +.. code-block:: + + { + "req": { + "database" : , + "chunk" : , + "compute_cs" : // Compute the control sum of the replica files if not 0 + } + } + +Find info on all existing chunk replicas if a database +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The Controller sends a POST request to the Replication worker to locate and report a status of all chunk replicas +in a given database: + +.. code-block: + + POST /worker/replica/find-all + +The request-specific attributes: + +.. code-block:: + + { + "req": { + "database" : + } + } + +Database management (SQL) Requests +---------------------------------- + + +Management requests +------------------- + +Tracking requests +^^^^^^^^^^^^^^^^^ + +The Controller sends a GET request to the Replication worker to track the status of the previously made +request and to retrieve results of the request if it's finished. The request URL should contain the unique +identifier ``id`` of the target request: + +.. code-block: + + GET /worker/request/track/:id + +In case of the successful request completion, the response object will not be empty and it will contain +the results of the request: + +.. code-block:: + + { + "result": { + ... + } + } + +Retreiving request status +^^^^^^^^^^^^^^^^^^^^^^^^^ + +The Controller sends a GET request to the Replication worker to get the status of the previously made +request. The request URL should contain the ID of the unique +identifier ``id`` of the target request: + +.. code-block: + + GET /worker/request/status/:id + +Note, that unlike the ``track`` request, the ``status`` request does not return the results of the request. +The result object will be present but it will be empty: + +.. code-block:: + + { + "result": {} + } + +Stopping/cancelling requests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The Controller sends a PUT request to the Replication worker to stop the previously made request: + +.. code-block: + + PUT /worker/request/stop/:id + +There are no request-specific attributes in the request object. + +Note, that unlike the ``track`` request, the ``stop`` request does not return the results of the request. +The result object will be present but it will be empty: + +.. code-block:: + + { + "result": {} + } + +Disposing completed requests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +There is a special request that's meant to be used by the Controller to dispose the completed +requests from the worker's internal storage. The request is sent as a POST request: + +.. code-block: + + POST /worker/request/dispose + +Where the request object is required to provide a collection (array) of the request IDs to be disposed: + +.. code-block:: + + { + "req": { + "ids": [ + , + ... + + ] + } + } + +The response object will have the completion status of the operation for each identifier mentioned in the request: + +.. code-block:: + + { + "result": { + "ids_disposed": { + : , + ... + : + } + } + } + +Where the value of the integer is the completion status of the operation. The value of ``1`` means that the request +was disposed successfully. The value of ``0`` means that the request was not found in a collection of the completed +requests + +Worker service management requests +---------------------------------- + +Requests in this category are meant to provide the Controller with the information on the worker service itself. +There are the following requests in this category: + +- TODO: link to: Get the worker status +- TODO: link to: Get info on requests at various stages of processing +- TODO: link to: Suspend the worker service +- TODO: link to: Resume the worker service +- TODO: link to: Drain requests at the worker service +- TODO: link to: Reconfigure the worker service + +The request-specific attributes are not required for these requests. + +Response objects of all service management requests have the following schema: + +.. code-block:: + + { + "status" : , // The completion status of the operation. Values corresponds to protocol::Status + "status_ext" : , // Extended status of this operation. Values corresponds to protocol::StatusExt [=NONE] + + "service_state" : , // The state of the worker service as defined in protocol::ServiceState + + "num_new_requests" : , + "num_in_progress_requests" : , + "num_finished_requests" : , + + "new_requests" : [ + ... + ], + "in_progress_requests" : [ + ... + ], + "finished_requests" : [ + ... + ] + } + +.. note:: + + The ``new_requests``, ``in_progress_requests``, and ``finished_requests`` are arrays of the request objects + that are in the corresponding state. These collections will not be empty only for the following request types: + + - Get info on requests at various stages of processing + - Drain requests at the worker service + + The schema of the request descriptors is the same as the schema of the corresponding original request objects. + +Get the worker status +^^^^^^^^^^^^^^^^^^^^^ + +The Controller sends a GET request to the Replication worker to get the status of the worker service: + +.. code-block: + + GET /worker/service/status + +Get info on requests at various stages of processing +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The Controller sends a GET request to the Replication worker to get the information on the requests: + +.. code-block: + + GET /worker/service/requests + +Suspend the worker service +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The Controller sends a PUT request to the Replication worker to suspend the worker service: + +.. code-block: + + PUT /worker/service/suspend + +Resume the worker service +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Drain requests at the worker service +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The Controller sends a PUT request to the Replication worker to drain (stop) all requests in the worker service: + +.. code-block: + + PUT /worker/service/drain + +The operation affects requests that are already in the processing queue or requests that are still +in the input queue waiting to be procesed. The finished requests are not affected by this operation. + + +Reconfigure the worker service +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The Controller sends a PUT request to the Replication worker to reconfigure the worker service: + +.. code-block: + + PUT /worker/service/reconfig diff --git a/doc/dev/index.rst b/doc/dev/index.rst index 0fc6b0cc5..6db85a1a0 100644 --- a/doc/dev/index.rst +++ b/doc/dev/index.rst @@ -10,3 +10,4 @@ Developer's Guide quick-start-devel doc scisql + api/index diff --git a/doc/ingest/api/reference/rest/general.rst b/doc/ingest/api/reference/rest/general.rst index 85603dd92..193f869a0 100644 --- a/doc/ingest/api/reference/rest/general.rst +++ b/doc/ingest/api/reference/rest/general.rst @@ -1,3 +1,5 @@ +.. _ingest-general: + General guidelines ==================