-
Notifications
You must be signed in to change notification settings - Fork 0
RFC #6: Refactoring Request Management System
Authors: K.Ciba, A.Tsaregorodtsev
Last Modified: 11.11.2012
The Request Management System (RMS) is designed for management of simple operations that are performed asynchronously on behalf of users - owners of the tasks. The RMS is used for multiple purposes: failure recovery ( failover system ), data management tasks and some others. It should be designed as an open system easily extendible for new types of tasks.
The core of the the RMS system is a request, which holds the information about its creator (DN and group), status, various timestamps (creation time, submission time, last update), job ID to which request belongs to, DIRAC setup and request's name. One request can be made of several sub-requests of various types (i.e. transfer, removal, registration, logupload, diset) and the operations that have to be executed (i.e. replicateAndRegister, registerFile, removeReplica etc.) to process request, source and destination storage elements to use, their statuses, various timestamps, error messages if any and order of their execution. The sub-request itself depending of type and operation can reference several sub-request's files, which again are holding all required bits of informations (i.e. file lfn or pfn or both, it's checksum and size, its GUID, status, error message etc.).
Current schema of the RequestDB.
All request information is kept in RequestDB database, which could use two kinds of back-ends: mysql (RequestDBMySQL) and local file system directory (RequestDBFile) through one common service (RequestManagerHandler) that could talk directly to the RequestDB allowing selection, insertion or update of particular request. All those CRUD operations are performed using specialised client interface (RequestClient).
The execution of requests is done by various specialised agents, each for one request type, i.e. TransferAgent which is processing transfer sub-requests, RemovalAgent for removal, RegistrationAgent of register sub-requests, DISETForwardingAgent for diset one and so on. The common pattern in agent code is to select sub-requests available for execution, perform some data manipulation to execute defined operation, update statuses in RequestDB and notify request's job when all sub-requests are done.
While on database side request is kept in a three closely connected tables (RequestDB.Requests, RequestDB.SubRequests and RequestDB.Files), on the python client side there is only one class available: RequestContainer. This imbalance between SQL and python world leads to not clear, too heavy, error prone and not so easy to use API.
As python API will be changed, one should consider this as a great opportunity for refactoring database schema as well.
Proposed schema of the RequestDB.
The most important changes in the new schema are:
- all columns holding statuses are ENUM with the well defined set of states
- SubRequests table is renamed to Operations, which better describes this table contents
- there is no need to keep RequestType and Operation column as only one bit of information (Operation.Type in the new schema) properly indicates required actions
- Files.Md5 and Files.Adler columns are dropped, instead of that there is a new entity holding a checksum type - Files.ChecksumType, while the checksum itself will be kept in Files.Checksum column
Inheritance diagram for Request zoo.
The basic ideas within the new API are:
- one python class per one SQL table
- the API should be lightweight
- all class members are named after DB column names
- class members are defined as python properties (see http://docs.python.org/2/library/functions.html#property)
- all classes are only a smart bags holding properties, no extra functionality except serialisation to SQL and or XML and manipulation/looping helpers of lower level classes (i.e. adding File instance to Operation, adding Operation instance to Request, looping over Operations in Request etc.)
- Operation execution order == index in Request::operations list
- status checking is very easy as you don't have to guess what SubRequests types are there
- mechanism for status 'calculation' from aggregated classes (i.e. at least one File is in 'Waiting', so Operation.Status is forced to be 'Waiting' too, this also could be propagated higher to Request object)
- same for request finalisation: should be done automatically when all Operations are 'Done'
- request should always be read as a whole: Request + all Operations + all Files, partial read should be forbidden
Statuses are somehow special and should be treated separately from all the other properties. In some cases user shouldn't be allow to modify them, i.e. if Operation has got at least one 'Waiting' File, its status cannot be set to 'Done', same for Request with at least one 'Waiting' or 'Queued' Operation).
Status propagation should be semi-automatic, i.e. on every change to File.Status, its parent (Operation) should be notified and if possible update its own status (i.e. checking if all children Files are 'Done').
State machines:
- for Request (no change to the previous implementation)
State machine for Request object.
- for Operation: new state 'Queued', at a time only one Operation in the Request is marked as 'Waiting', all the others are 'Done' or 'Queued' - this will save system resources, as the only 'selectable' Operations (and hence Requests) are those which are really waiting to be executed (their execution order == Request's current execution order)
State machine for Operation object.
- for File
State machine for File object.
For execution of request there should be implemented a new class based on Executors component with built-in state machine (possibly using Observer pattern) together with pluggable modules for "operation executors".
Separation, moving TransferDB to DataManagement System, one to one correspondence of requests and FTS jobs