-
Notifications
You must be signed in to change notification settings - Fork 40
Core Services Failover Campaign 2014
- Leader: Bouchra Rahim (CNRST, MAGRID)
- Start: 15/12/2014
- End: 23/12/2014
The Regional Operations Centre operates certain so-called "Core Services", which are described in the Resource Infrastructure Provider MoU signed between CSIR Meraka and EGI.eu. These need to be 100 % available (actual thresholds are defined in the MoU on a per-service basis), and as such, as fail-over capability is needed.
Currently, we have a next version of the services, to provide continuous integration and rolling updates, however they are at the same site as the actual production services, this when these suffer a network or power outage, we lose everything.
The main impact is that A/R for sites is degraded, which is not the sites' problem but the fault of the ROC.
- ROC : Definition of services to be replicated and DevOps code for deployment
- MA-01-CNRST : Provision VMs with relevant IP and config on which to deploy the services.
We would like the machines to be available in both "regions" of the ROC - north and south. For this reason, we plan to put the failover services in Morocco. Futher backup instances can be considered later.
Requirements:
- VM Resources :
-
2 core, > 4GB RAM
-
50 GB disk
-
- Network :
- public IP
- BDII ports open
- ssh port to Ansible control machine open
Procedure See https://wiki.egi.eu/wiki/MAN05_top-BDII_and_site-BDII_High_Availability
Requirements:
- VM Resources :
-
4 core, > 6GB RAM
-
50 GB disk
-
- Network :
- public IP
- SAM-NAGIOS ports open
- ssh port to Ansible control machine open
Procedure ... todo ...
for more information on what's going on, see the ROC webpage.