-
Notifications
You must be signed in to change notification settings - Fork 5
Yarn Reservation System
The Reservation System of YARN provides the user the ability to reserve resources over (and ahead of) time, to ensure that important production jobs will be run very predictably. The ReservationSystem performs careful admission control and provides guarantees over absolute amounts of resources (instead of % of cluster size). Reservation can be both malleable or have gang semantics, and can have time-varying resource requirements. The ReservationSystem is a component of the YARN ResourceManager. (https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ReservationSystem.html)
A typical reservation proceeds as follows:
Step 0 The user (or an automated tool on its behalf) submits a reservation creation request, and receives a response containing the ReservationId.
Step 1 The user (or an automated tool on its behalf) submits a reservation request specified by the Reservation Definition Language (RDL) and ReservationId retrieved from the previous step. This describes the user need for resources over-time (e.g., a skyline of resources) and temporal constraints (e.g., deadline). This can be done both programmatically through the usual Client-to-RM protocols or via the REST api of the RM. If a reservation is submitted with the same ReservationId, and the RDL is the same, a new reservation will not be created and the request will be successful. If the RDL is different, the reservation will be rejected, and the request will be unsuccessful.
Step 2 The ReservationSystem leverages a ReservationAgent (GREE in the figure) to find a plausible allocation for the reservation in the Plan, a data structure tracking all reservation currently accepted and the available resources in the system.
Step 3 The SharingPolicy provides a way to enforce invariants on the reservation being accepted, potentially rejecting reservations. For example, the CapacityOvertimePolicy allows enforcement of both instantaneous max-capacity a user can request across all of his/her reservations and a limit on the integral of resources over a period of time, e.g., the user can reserve up to 50% of the cluster capacity instantanesouly, but in any 24h period of time he/she cannot exceed 10% average.
Step 4 Upon a successful validation the ReservationSystem returns to the user a ReservationId (think of it as an airline ticket).
Step 5 When the time comes, a new component called the PlanFollower publishes the state of the plan to the scheduler, by dynamically creating/tweaking/destroying queues.
Step 6 The user can then submit one (or more) jobs to the reservable queue, by simply including the ReservationId as part of the ApplicationSubmissionContext.
Step 7 The Scheduler will then provide containers from a special queue created to ensure resources reservation is respected. Within the limits of the reservation, the user has guaranteed access to the resources, above that resource sharing proceed with standard Capacity/Fairness sharing.
Step 8 The system includes mechanisms to adapt to drop in cluster capacity. This consists in replanning by “moving” the reservation if possible, or rejecting the smallest amount of previously accepted reservation (to ensure that other reservation will receive their full amount).
The Yarn Reservation system can be configured both with the CapacityScheduler and FairScheduler. You can mark any leaf queue in the capacity-scheduler.xml or fair-scheduler.xml as available for “reservations” (see CapacityScheduler and the FairScheduler for details). Then the capacity/fair share within that queue can be used for making reservations. Jobs can still be submitted to the reservable queue without a reservation, in which case they will be run in best-effort mode in whatever capacity is left over by the jobs running within active reservations.
Yarn allows us to create reservations and list them using an API. A brief set of examples of the APIs is listed here:
Use the New Reservation API, to obtain a reservation-id which can then be used as part of the Cluster Reservation API Submit to submit reservations. (This feature is currently in the alpha stage and may change in the future.)
URI http://rm-http-address:port/ws/v1/cluster/reservation/new-reservation
Response Body:
{
"reservation-id":"reservation_1404198295326_0003"
}
The Cluster Reservation API can be used to list reservations. When listing reservations the user must specify the constraints in terms of a queue, reservation-id, start time or end time. The user must also specify whether or not to include the full resource allocations of the reservations being listed. The resulting page returns a response containing information related to the reservation such as the acceptance time, the user, the resource allocations, the reservation-id, as well as the reservation definition.
URI http://rm-http-address:port/ws/v1/cluster/reservation/list
Request Parameters
queue - the queue name containing the reservations to be listed. if not set, this value will default to “default”.
reservation-id - the reservation-id of the reservation which will be listed. If this parameter is present, start-time and end-time will be ignored.
start-time - reservations that end after this start-time will be listed. If unspecified or invalid, this will default to 0.
end-time - reservations that start after this end-time will be listed. If unspecified or invalid, this will default to Long.MaxValue.
include-resource-allocations - true or false. If true, the resource allocations of the reservation will be included in the response. If false, no resource allocations will be included in the response. This will default to false.
Response
reservations array of ReservationInfo(JSON) / zero or more ReservationInfo objects(XML)
The reservations that are listed with the given query
The Cluster Reservation API can be used to submit reservations. When submitting a reservation the user specifies the constraints in terms of resources, and time that is required. The resulting response is successful if the reservation can be made. If a reservation-id is used to submit a reservation multiple times, the request will succeed if the reservation definition is the same, but only one reservation will be created. If the reservation definition is different, the server will respond with an error response. When the reservation is made, the user can use the reservation-id used to submit the reservation to get access to the resources by specifying it as part of Cluster Submit Applications API.
URI http://rm-http-address:port/ws/v1/cluster/reservation/submit
Request Parameters
queue string The (reservable) queue you are submitting to
reservation-definition object A set of constraints representing the need for resources over time of a user.
reservation-id string The reservation id to use to submit the reservation.
Response Body:
Content-Type: application/json
{
"queue" : "dedicated",
"reservation-id":"reservation_1404198295326_0003"
"reservation-definition" : {
"arrival" : 1765541532000,
"deadline" : 1765542252000,
"reservation-name" : "res_1",
"reservation-requests" : {
"reservation-request-interpreter" : 0,
"reservation-request" : [
{
"duration" : 60000,
"num-containers" : 220,
"min-concurrency" : 220,
"capability" : {
"memory" : 1024,
"vCores" : 1
}
},
{
"duration" : 120000,
"num-containers" : 110,
"min-concurrency" : 1,
"capability" : {
"memory" : 1024,
"vCores" : 1
}
}
]
}
}
}
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ReservationSystem.html
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html