From f74ed5a38576c63df7b62e7ce92c16f6278c42c5 Mon Sep 17 00:00:00 2001
From: <>
Date: Fri, 23 Feb 2024 16:46:07 +0000
Subject: [PATCH] Deployed 89026870 with MkDocs version: 1.1.2
---
index.html | 1 +
search/search_index.json | 2 +-
sitemap.xml | 58 +++++++++++++++++++--------------------
sitemap.xml.gz | Bin 498 -> 497 bytes
4 files changed, 31 insertions(+), 30 deletions(-)
diff --git a/index.html b/index.html
index 2e8bb9ed..53191c84 100644
--- a/index.html
+++ b/index.html
@@ -832,6 +832,7 @@
Weekly Operations Meetings
Meeting ID: 183 382 852 (password required; available on request)
Meeting Minutes
+- February 23, 2024
- February 16, 2024
- February 9, 2024
- February 2, 2024
diff --git a/search/search_index.json b/search/search_index.json
index 6fa1c435..07ba384e 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"OSG Operations Welcome to the home page of the OSG Operations Team documentation area! Mission The mission of OSG Operations is to maintain and improve distributed high throughput computing services to support research communities. This is accomplished by: Operating and maintaining our services in a user-oriented, robust, and reliable manner. Developing a professional and skilled staff dedicated to a service philosophy. Managing resources responsibly, efficiently, and with accountability. Evaluating and continually improving the actions, methods and processes that allow the OSG to operate. Contact Us Open a Ticket Slack channel - if you can't create an account, send an e-mail to help@opensciencegrid.org Email: help@opensciencegrid.org Registration (Contact, Resource, VO, or Project) Register with OSG Weekly Operations Meetings When: Fridays 12:30 pm Central URL: https://unl.zoom.us/j/183382852 Phone: +1 669 900 6833 or +1 408 638 0968 or +1 646 876 9923 Meeting ID: 183 382 852 (password required; available on request) Meeting Minutes February 16, 2024 February 9, 2024 February 2, 2024 January 26, 2024 January 19, 2024 January 12, 2024 January 5, 2024 December 29, 2023 (canceled) December 22, 2023 (canceled) December 15, 2023 December 8, 2023 December 1, 2023 November 24, 2023 (canceled) November 17, 2023 November 10, 2023 November 3, 2023 October 27, 2023 October 20, 2023 October 13, 2023 October 6, 2023 September 29, 2023 September 22, 2023 September 15, 2023 September 8, 2023 September 1, 2023 August 25, 2023 August 18, 2023 August 11, 2023 August 4, 2023 July 28, 2023 July 21, 2023 January 14, 2023 (canceled due to Throughput Computing 23) July 7, 2023 June 30, 2023 June 23, 2023 June 16, 2023 June 9, 2023 June 2, 2023 May 26, 2023 May 19, 2023 May 12, 2023 May 5, 2023 April 28, 2023 April 21, 2023 April 14, 2023 April 7, 2023 March 31, 2023 March 24, 2023 March 17, 2023 March 10, 2023 March 3, 2023 February 24, 2023 February 17, 2023 February 10, 2023 February 3, 2023 January 27, 2023 January 20, 2023 January 13, 2023 January 6, 2023 (canceled) December 30, 2022 (canceled) December 23, 2022 (canceled) December 16, 2022 December 9, 2022 December 2, 2022 November 25, 2022 (canceled) November 18, 2022 November 11, 2022 November 4, 2022 October 28, 2022 October 21, 2022 October 14, 2022 October 7, 2022 September 30, 2022 September 23, 2022 (canceled) September 16, 2022 (canceled) September 9, 2022 September 2, 2022 August 26, 2022 August 19, 2022 August 12, 2022 August 5, 2022 (canceled) July 29, 2022 July 22, 2022 (canceled) July 15, 2022 July 8, 2022 July 1, 2022 June 24, 2022 June 17, 2022 June 10, 2022 June 3, 2022 May 27, 2022 May 20, 2022 May 13, 2022 May 6, 2022 (canceled) April 29, 2022 April 22, 2022 April 15, 2022 April 8, 2022 April 1, 2022 March 25, 2022 March 18, 2022 (canceled) March 11, 2022 March 4, 2022 February 25, 2022 February 18, 2022 February 11, 2022 February 4, 2022 January 28, 2022 January 21, 2022 January 14, 2022 January 7, 2022 December 31, 2021 (canceled) December 24, 2021 (canceled) December 17, 2021 December 10, 2021 December 3, 2021 November 26, 2021 (canceled) November 19, 2021 November 12, 2021 November 5, 2021 October 29, 2021 October 22, 2021 October 15, 2021 (canceled) October 8, 2021 October 1, 2021 September 24, 2021 September 17, 2021 September 10, 2021 September 3, 2021 August 27, 2021 August 20, 2021 August 13, 2021 August 6, 2021 July 30, 2021 July 23, 2021 July 16, 2021 July 9, 2021 July 2, 2021 June 25, 2021 June 18, 2021 June 11, 2021 June 4, 2021 May 28, 2021 May 21, 2021 May 14, 2021 May 7, 2021 April 30, 2021 April 23, 2021 April 16, 2021 April 9, 2021 April 2, 2021 March 26, 2021 March 19, 2021 March 12, 2021 March 5, 2021 (canceled) February 26, 2021 February 19, 2021 February 12, 2021 February 5, 2021 January 29, 2021 January 22, 2021 January 15, 2021 January 8, 2021 January 1, 2021 (canceled) December 25, 2020 (canceled) December 18, 2020 December 11, 2020 December 4, 2020 November 20, 2020 November 13, 2020 November 6, 2020 October 30, 2020 October 23, 2020 October 16, 2020 October 9, 2020 October 2, 2020 September 25, 2020 September 18, 2020 September 11, 2020 September 4, 2020 (canceled) August 28, 2020 August 21, 2020 August 14, 2020 August 7, 2020 July 31, 2020 July 24, 2020 July 17, 2020 July 10, 2020 July 3, 2020 (canceled) June 26, 2020 June 19, 2020 June 12, 2020 June 5, 2020 May 29, 2020 (canceled) May 22, 2020 May 15, 2020 May 8, 2020 May 1, 2020 April 24, 2020 April 17, 2020 April 10, 2020 April 3, 2020 March 27, 2020 March 20, 2020 March 13, 2020 March 6, 2020 February 28, 2020 February 21, 2020 February 14, 2020 February 7, 2020 January 31, 2020 January 24, 2020 January 17, 2020 January 10, 2020 January 3, 2020 December 27, 2019 December 20, 2019 December 13, 2019 December 6, 2019 November 29, 2019 (canceled) November 22, 2019 November 15, 2019 November 8, 2019 November 1, 2019 October 25, 2019 October 18, 2019 October 11, 2019 October 4, 2019 September 27, 2019 September 20, 2019 September 13, 2019 September 6, 2019 August 30, 2019 August 23, 2019 August 16, 2019 August 9, 2019 August 2, 2019 July 26, 2019 July 19, 2019 July 12, 2019 July 8, 2019 July 1, 2019 June 24, 2019 June 17, 2019 June 10, 2019 June 3, 2019 May 28, 2019 May 20, 2019 May 13, 2019 May 6, 2019 April 29, 2019 April 22, 2019 April 15, 2019 April 8, 2019 April 1, 2019 March 25, 2019 March 18, 2019 (canceled due to HOW 2019) March 11, 2019 March 4, 2019 February 25, 2019 February 19, 2019 February 11, 2019 February 4, 2019 January 28, 2019 (canceled due to F2F meeting) January 22, 2019 January 14, 2019 January 7, 2019 December 31, 2018 (canceled) December 24, 2018 (canceled) December 17, 2018 December 10, 2018 December 3, 2018 November 26, 2018 November 19, 2018 November 13, 2018 November 5, 2018 (canceled) October 29, 2018 (canceled) October 22, 2018 (canceled) October 15, 2018 October 8, 2018 October 1, 2018 September 24, 2018 September 17, 2018 September 10, 2018 September 4, 2018 August 27, 2018 August 20, 2018 August 13, 2018 August 6, 2018 Archived Meeting Minutes For archived meeting minutes, see the GitHub repository","title":"Home"},{"location":"#osg-operations","text":"Welcome to the home page of the OSG Operations Team documentation area!","title":"OSG Operations"},{"location":"#mission","text":"The mission of OSG Operations is to maintain and improve distributed high throughput computing services to support research communities. This is accomplished by: Operating and maintaining our services in a user-oriented, robust, and reliable manner. Developing a professional and skilled staff dedicated to a service philosophy. Managing resources responsibly, efficiently, and with accountability. Evaluating and continually improving the actions, methods and processes that allow the OSG to operate.","title":"Mission"},{"location":"#contact-us","text":"Open a Ticket Slack channel - if you can't create an account, send an e-mail to help@opensciencegrid.org Email: help@opensciencegrid.org","title":"Contact Us"},{"location":"#registration-contact-resource-vo-or-project","text":"Register with OSG","title":"Registration (Contact, Resource, VO, or Project)"},{"location":"#weekly-operations-meetings","text":"When: Fridays 12:30 pm Central URL: https://unl.zoom.us/j/183382852 Phone: +1 669 900 6833 or +1 408 638 0968 or +1 646 876 9923 Meeting ID: 183 382 852 (password required; available on request)","title":"Weekly Operations Meetings"},{"location":"#meeting-minutes","text":"February 16, 2024 February 9, 2024 February 2, 2024 January 26, 2024 January 19, 2024 January 12, 2024 January 5, 2024 December 29, 2023 (canceled) December 22, 2023 (canceled) December 15, 2023 December 8, 2023 December 1, 2023 November 24, 2023 (canceled) November 17, 2023 November 10, 2023 November 3, 2023 October 27, 2023 October 20, 2023 October 13, 2023 October 6, 2023 September 29, 2023 September 22, 2023 September 15, 2023 September 8, 2023 September 1, 2023 August 25, 2023 August 18, 2023 August 11, 2023 August 4, 2023 July 28, 2023 July 21, 2023 January 14, 2023 (canceled due to Throughput Computing 23) July 7, 2023 June 30, 2023 June 23, 2023 June 16, 2023 June 9, 2023 June 2, 2023 May 26, 2023 May 19, 2023 May 12, 2023 May 5, 2023 April 28, 2023 April 21, 2023 April 14, 2023 April 7, 2023 March 31, 2023 March 24, 2023 March 17, 2023 March 10, 2023 March 3, 2023 February 24, 2023 February 17, 2023 February 10, 2023 February 3, 2023 January 27, 2023 January 20, 2023 January 13, 2023 January 6, 2023 (canceled) December 30, 2022 (canceled) December 23, 2022 (canceled) December 16, 2022 December 9, 2022 December 2, 2022 November 25, 2022 (canceled) November 18, 2022 November 11, 2022 November 4, 2022 October 28, 2022 October 21, 2022 October 14, 2022 October 7, 2022 September 30, 2022 September 23, 2022 (canceled) September 16, 2022 (canceled) September 9, 2022 September 2, 2022 August 26, 2022 August 19, 2022 August 12, 2022 August 5, 2022 (canceled) July 29, 2022 July 22, 2022 (canceled) July 15, 2022 July 8, 2022 July 1, 2022 June 24, 2022 June 17, 2022 June 10, 2022 June 3, 2022 May 27, 2022 May 20, 2022 May 13, 2022 May 6, 2022 (canceled) April 29, 2022 April 22, 2022 April 15, 2022 April 8, 2022 April 1, 2022 March 25, 2022 March 18, 2022 (canceled) March 11, 2022 March 4, 2022 February 25, 2022 February 18, 2022 February 11, 2022 February 4, 2022 January 28, 2022 January 21, 2022 January 14, 2022 January 7, 2022 December 31, 2021 (canceled) December 24, 2021 (canceled) December 17, 2021 December 10, 2021 December 3, 2021 November 26, 2021 (canceled) November 19, 2021 November 12, 2021 November 5, 2021 October 29, 2021 October 22, 2021 October 15, 2021 (canceled) October 8, 2021 October 1, 2021 September 24, 2021 September 17, 2021 September 10, 2021 September 3, 2021 August 27, 2021 August 20, 2021 August 13, 2021 August 6, 2021 July 30, 2021 July 23, 2021 July 16, 2021 July 9, 2021 July 2, 2021 June 25, 2021 June 18, 2021 June 11, 2021 June 4, 2021 May 28, 2021 May 21, 2021 May 14, 2021 May 7, 2021 April 30, 2021 April 23, 2021 April 16, 2021 April 9, 2021 April 2, 2021 March 26, 2021 March 19, 2021 March 12, 2021 March 5, 2021 (canceled) February 26, 2021 February 19, 2021 February 12, 2021 February 5, 2021 January 29, 2021 January 22, 2021 January 15, 2021 January 8, 2021 January 1, 2021 (canceled) December 25, 2020 (canceled) December 18, 2020 December 11, 2020 December 4, 2020 November 20, 2020 November 13, 2020 November 6, 2020 October 30, 2020 October 23, 2020 October 16, 2020 October 9, 2020 October 2, 2020 September 25, 2020 September 18, 2020 September 11, 2020 September 4, 2020 (canceled) August 28, 2020 August 21, 2020 August 14, 2020 August 7, 2020 July 31, 2020 July 24, 2020 July 17, 2020 July 10, 2020 July 3, 2020 (canceled) June 26, 2020 June 19, 2020 June 12, 2020 June 5, 2020 May 29, 2020 (canceled) May 22, 2020 May 15, 2020 May 8, 2020 May 1, 2020 April 24, 2020 April 17, 2020 April 10, 2020 April 3, 2020 March 27, 2020 March 20, 2020 March 13, 2020 March 6, 2020 February 28, 2020 February 21, 2020 February 14, 2020 February 7, 2020 January 31, 2020 January 24, 2020 January 17, 2020 January 10, 2020 January 3, 2020 December 27, 2019 December 20, 2019 December 13, 2019 December 6, 2019 November 29, 2019 (canceled) November 22, 2019 November 15, 2019 November 8, 2019 November 1, 2019 October 25, 2019 October 18, 2019 October 11, 2019 October 4, 2019 September 27, 2019 September 20, 2019 September 13, 2019 September 6, 2019 August 30, 2019 August 23, 2019 August 16, 2019 August 9, 2019 August 2, 2019 July 26, 2019 July 19, 2019 July 12, 2019 July 8, 2019 July 1, 2019 June 24, 2019 June 17, 2019 June 10, 2019 June 3, 2019 May 28, 2019 May 20, 2019 May 13, 2019 May 6, 2019 April 29, 2019 April 22, 2019 April 15, 2019 April 8, 2019 April 1, 2019 March 25, 2019 March 18, 2019 (canceled due to HOW 2019) March 11, 2019 March 4, 2019 February 25, 2019 February 19, 2019 February 11, 2019 February 4, 2019 January 28, 2019 (canceled due to F2F meeting) January 22, 2019 January 14, 2019 January 7, 2019 December 31, 2018 (canceled) December 24, 2018 (canceled) December 17, 2018 December 10, 2018 December 3, 2018 November 26, 2018 November 19, 2018 November 13, 2018 November 5, 2018 (canceled) October 29, 2018 (canceled) October 22, 2018 (canceled) October 15, 2018 October 8, 2018 October 1, 2018 September 24, 2018 September 17, 2018 September 10, 2018 September 4, 2018 August 27, 2018 August 20, 2018 August 13, 2018 August 6, 2018","title":"Meeting Minutes"},{"location":"#archived-meeting-minutes","text":"For archived meeting minutes, see the GitHub repository","title":"Archived Meeting Minutes"},{"location":"external-oasis-repos/","text":"External OASIS Repositories We offer hosting of non-OSG CVMFS repositories on OASIS. This means that requests to create, rename, remove, or blanking OASIS repositories will come in as GOC tickets. This document contains instructions for handling those tickets. Also see Policy for OSG Mirroring of External CVMFS repositories External OASIS repository Requests to Host a Repository on OASIS Ensure that the repository administrator is valid for the VO. This can be done by (a) OSG already having a relationship with the person or (b) the contacting the VO manager to find out. Also, the person should be listed in the OSG topology contacts list . Review provided URL and verify that it is appropriate for the VO and no other project uses it already. In order to make sure the name in URL is appropriate, check that the name is derived from the VO name or one of its projects. Then, add the repository URL to the topology for given VO under the OASISRepoURLs . This should cause the repository's configuration to be added to the OSG Stratum-0 within 15 minutes after URL is added into the topology. For example, if new URL is for the VO DUNE http://hcc-cvmfs-repo.unl.edu:8000/cvmfs/dune.osgstorage.org edit the following under the OASIS section and create PR: git clone git://github.com/opensciencegrid/topology.git vim topology/virtual-organizations/DUNE.yaml ... OASIS: OASISRepoURLs: - http://hcc-cvmfs-repo.unl.edu:8000/cvmfs/dune.osgstorage.org/ ... When the PR is approved, check on the oasis.opensciencegrid.org host whether the new repository was successfuly signed. There should be message about it in the log file /var/log/oasis/generate_whitelists.log : Tue Sep 25 17:34:02 2018 Running add_osg_repository http://hcc-cvmfs-repo.unl.edu:8000/cvmfs/dune.osgstorage.org dune.osgstorage.org: Signing 7 day whitelist with masterkeycard... done If the respository ends in a new domain name that has not been distributed before, a new domain key will be needed on oasis-replica which should get automatically downloaded from the etc/cvmfs/keys directory in the master branch of the config-repo github repository . There should be a message about downloading it in the log file /var/log/cvmfs/generate_replicas.log . After the key is downloaded the repository should also be automatically added, with messages in the same log file. After the repository is successfully on oasis-replica, in addition you need to update the OSG configuration repository. Make changes in a workspace cloned from the config-repo github repository and use the osg branch (or a branch made from it) in a personal account on oasis-itb . Add a domain configuration in etc/cvmfs/domain.d that's a lot like one of the other imported domains, for example egi.eu.conf . The server urls might be slightly different; use the URLs of the stratum 1s where it is already hosted if there are any, and you can add at least the FNAL and BNL stratum 1s. Copy key(s) for the domain into etc/cvmfs/keys from the master branch, either a single .pub file or a directory, whichever the master branch has. Test all these changes out on the config-osg.opensciencegrid.org repository on oasis-itb using the copy_config_osg command, and configure a test client to read from oasis-itb.opensciencegrid.org instead of oasis.opensciencegrid.org . Then commit those changes into a new branch you made from the osg branch, and make a pull request. Once that PR is approved and merged, log in to the oasis machine and run copy_config_osg as root there to copy from github to the production configuration repository on the oasis machine. If the repository name does not match *.opensciencegrid.org or *.osgstorage.org , skip this step and go on to your next step. If it does match one of those two patterns, then respond to the ticket to tell the administrator to continue with their next step (their step 4). We don't want them to continue before 15 minutes has elapsed after step 2 above, so either wait that much time or tell them the time they may proceed (15 minutes after you updated topology). Then wait until the admin has updated the ticket to indicate that they have completed their step before moving on. Ask the administrator of the BNL stratum 1 (John De Stefano) to also add the new repository. The BNL Stratum-1 administrator should set the service to read from http://oasis-replica.opensciencegrid.org:8002/cvmfs/ . When the BNL Stratum-1 administrator has reported back that the replication is ready, respond to the requester that the repository is fully replicated on the OSG and close the ticket. Requests to Change the URL of an External Repository If there is a request to change the URL of an external repository, update the registered value in OASISRepoURLs for the respective VO in the topology. Tell the requester that it is ready 15 minutes after topology is updated. Requests to Remove an External Repository After validating that the ticket submitter is authorized by the VO's OASIS manager, delete the registered value for in topology for the VO in OASIS Repo URLs. Verify that it is removed by running the following on any oasis machine to make sure it is missing from the list: print_osg_repos|grep Check if the repository has been replicated to RAL by looking in their repositories.json . The user documentation requests the user to make a GGUS ticket to do this, so either ask them to do it or do it yourself. Add the BNL Stratum-1 operator (John De Stefano) to the ticket and ask him to remove the repository. Wait for him to finish before proceeding. Add the FNAL Stratum-1 operators (Merina Albert, Hyun Woo Kim) to the ticket and ask them when they can be ready to delete the repository. They can't remove it before it is removed from oasis-replica because their Stratum-1 automatically adds all repositories oasis-replica has. However, it has to be done within 8 hours of removal on oasis-replica or an alarm will start going off. Run the following command on oasis , oasis-itb , oasis-replica and oasis-replica-itb : remove_osg_repository -f Tell the FNAL Stratum-1 operators to go ahead and remove the repository. Response to Security Incident on an External Repository If there is a security incident on the publishing machine of an external repository and a publishing key is compromised, the fingerprint of that key should be added to /cvmfs/config-osg.opensciencegrid.org/etc/cvmfs/blacklist . In addition, another line should be added in the form . When the BNL Stratum-1 administrator has reported back that the replication is ready, respond to the requester that the repository is fully replicated on the OSG and close the ticket.","title":"Requests to Host a Repository on OASIS"},{"location":"external-oasis-repos/#requests-to-change-the-url-of-an-external-repository","text":"If there is a request to change the URL of an external repository, update the registered value in OASISRepoURLs for the respective VO in the topology. Tell the requester that it is ready 15 minutes after topology is updated.","title":"Requests to Change the URL of an External Repository"},{"location":"external-oasis-repos/#requests-to-remove-an-external-repository","text":"After validating that the ticket submitter is authorized by the VO's OASIS manager, delete the registered value for in topology for the VO in OASIS Repo URLs. Verify that it is removed by running the following on any oasis machine to make sure it is missing from the list: print_osg_repos|grep Check if the repository has been replicated to RAL by looking in their repositories.json . The user documentation requests the user to make a GGUS ticket to do this, so either ask them to do it or do it yourself. Add the BNL Stratum-1 operator (John De Stefano) to the ticket and ask him to remove the repository. Wait for him to finish before proceeding. Add the FNAL Stratum-1 operators (Merina Albert, Hyun Woo Kim) to the ticket and ask them when they can be ready to delete the repository. They can't remove it before it is removed from oasis-replica because their Stratum-1 automatically adds all repositories oasis-replica has. However, it has to be done within 8 hours of removal on oasis-replica or an alarm will start going off. Run the following command on oasis , oasis-itb , oasis-replica and oasis-replica-itb : remove_osg_repository -f Tell the FNAL Stratum-1 operators to go ahead and remove the repository.","title":"Requests to Remove an External Repository"},{"location":"external-oasis-repos/#response-to-security-incident-on-an-external-repository","text":"If there is a security incident on the publishing machine of an external repository and a publishing key is compromised, the fingerprint of that key should be added to /cvmfs/config-osg.opensciencegrid.org/etc/cvmfs/blacklist . In addition, another line should be added in the form ReportableVOName: Corrected VOName: CSV File A CSV file can be specified in order to specify multiple corrections in a single batch update. The CSV file must be of a certain format. No Header Row The number of columns must be at least the number of matching attributes and the corrected attribute. For example, a CSV file for VO corrections would be of format: ,,,.... The CSV file can be specified on the command line with the option --csv , for example: ./gracc-correct vo add --csv ","title":"GRACC Corrections"},{"location":"services/gracc-corrections/#installing-gracc-corrections","text":"GRACC Corrections are used to modify records during the summarization process. RAW records are not modified in the correction process. The correction is applied after summarization and aggregation, but before the record is enriched with data from Topology . The correction is step 3 in the GRACC summary record workflow: Raw record is received. The raw record is never modified Summarizer aggregates the raw records Corrections are applied Summarized records are enriched by Topology Summarized and enriched records are uploaded to GRACC We can currently correct: VO Names Project Names OIM_Site (using the Host_description field)","title":"Installing GRACC Corrections"},{"location":"services/gracc-corrections/#limitations","text":"Additional corrections can be written, but some attributes are used to detect duplicate records, and are therefore protected from corrections. Protected records for summarization are: EndTime, RawVOName, RawProjectName, DN, Processors, ResourceType, CommonName, Host_description, Resource_ExitCode, Grid, ReportableVOName, ProbeName For example, we could not write a correction for the Host_description . If we had a correction that changed Host_description , then the duplicate detection would not detect the same record during resummarization and it would have duplicate summarized records.","title":"Limitations"},{"location":"services/gracc-corrections/#command-line","text":"The gracc-correct tool is used to create, update, and delete corrections. The tool must be run from a host that can write to GRACC, which is very restricted. It is recommended to run the gracc-correct tool directly from the gracc.opensciencegrid.org host. The gracc-correct tool is able to parse new corrections either individually from user input or many at once from a CSV file.","title":"Command Line"},{"location":"services/gracc-corrections/#user-input","text":"Each correction attempts to match one or more attributes of the summarized record in order to set another attribute. For example, for the VO correction: $ gracc-correct vo add Field ( s ) to correct: VOName: ReportableVOName: Corrected VOName: ","title":"User Input"},{"location":"services/gracc-corrections/#csv-file","text":"A CSV file can be specified in order to specify multiple corrections in a single batch update. The CSV file must be of a certain format. No Header Row The number of columns must be at least the number of matching attributes and the corrected attribute. For example, a CSV file for VO corrections would be of format: ,,,.... The CSV file can be specified on the command line with the option --csv , for example: ./gracc-correct vo add --csv ","title":"CSV File"},{"location":"services/hosted-ce-definitions/","text":"OSG Hosted CE Definitions The OSG provides a Hosted CE service. In general, this document lists what an instance of that service can and cannot do. Hosted CEs in General Benefits The site continues to operate its own batch system according to local considerations; OSG operates the interface between OSG and the site, aka the Hosted CE; To the site, OSG simply looks like a set of user accounts; and OSG uses the accounts to provision site resources for various science user communities, and hence the site has complete control over resource allocation via local policies on the accounts. Prerequisites In general, the site must operate a working batch system that is accessible via at least one head node; OSG works with HTCondor, Slurm, PBS Pro/Torque, LSF, and Grid Engine. Site operations include hardware and software maintenance, defining and implementing usage policies, monitoring, troubleshooting, etc. These are the same activities to support local users. In addition, the site: Must communicate with OSG their intent to share resources \u2014 in most cases, a meeting between site and OSG staff should be sufficient to discuss goals, plans, etc.; Must meet the technical requirements on the OSG website , summarized below: The site is willing to add OSG user accounts with inbound SSH access and submit privileges, A mechanism exists for transferring files between the head nodes and worker nodes, and Worker nodes must have outbound Internet access and temporary storage space for jobs. Is strongly encouraged to tell OSG about preferred constraints on resource requests (e.g., per-job limits on CPUs, memory, and storage; overall limits on number of running and idle jobs; submission rates), so that OSG can tailor such requests to better fit the site. Standard Hosted CE A Standard Hosted CE is the default case in which the interaction between OSG and the site is relatively simple and easy to maintain. Most sites fall into this category. Benefits Configuration is limited to basics, so there is less upfront and ongoing work for OSG and the site; OSG maintains and shares mappings from user groups to OSG user accounts on the site, so that the site can \u2014 if desired \u2014 limit resource allocations to certain groups; and OSG maintains the required OSG configuration on the site\u2019s head node and worker nodes (if the site provides a distribution mechanism to worker nodes, such as a shared file system). Site Responsibilities In addition to the general prerequisites above, the following apply to a Standard Hosted CE: The site must create and maintain 20 OSG user accounts on a single head node; note that: OSG will access their accounts via SSH using one RSA key for all 20 accounts; and All 20 OSG accounts must be able to submit to the local batch system. The site may control the resources allocated to different OSG user groups by writing and maintaining policies on the OSG user accounts within the batch system. The site provides privilege separation among the OSG user groups via the OSG user accounts and standard Unix privilege separation.","title":"Hosted CE Definitions"},{"location":"services/hosted-ce-definitions/#osg-hosted-ce-definitions","text":"The OSG provides a Hosted CE service. In general, this document lists what an instance of that service can and cannot do.","title":"OSG Hosted CE Definitions"},{"location":"services/hosted-ce-definitions/#hosted-ces-in-general","text":"","title":"Hosted CEs in General"},{"location":"services/hosted-ce-definitions/#benefits","text":"The site continues to operate its own batch system according to local considerations; OSG operates the interface between OSG and the site, aka the Hosted CE; To the site, OSG simply looks like a set of user accounts; and OSG uses the accounts to provision site resources for various science user communities, and hence the site has complete control over resource allocation via local policies on the accounts.","title":"Benefits"},{"location":"services/hosted-ce-definitions/#prerequisites","text":"In general, the site must operate a working batch system that is accessible via at least one head node; OSG works with HTCondor, Slurm, PBS Pro/Torque, LSF, and Grid Engine. Site operations include hardware and software maintenance, defining and implementing usage policies, monitoring, troubleshooting, etc. These are the same activities to support local users. In addition, the site: Must communicate with OSG their intent to share resources \u2014 in most cases, a meeting between site and OSG staff should be sufficient to discuss goals, plans, etc.; Must meet the technical requirements on the OSG website , summarized below: The site is willing to add OSG user accounts with inbound SSH access and submit privileges, A mechanism exists for transferring files between the head nodes and worker nodes, and Worker nodes must have outbound Internet access and temporary storage space for jobs. Is strongly encouraged to tell OSG about preferred constraints on resource requests (e.g., per-job limits on CPUs, memory, and storage; overall limits on number of running and idle jobs; submission rates), so that OSG can tailor such requests to better fit the site.","title":"Prerequisites"},{"location":"services/hosted-ce-definitions/#standard-hosted-ce","text":"A Standard Hosted CE is the default case in which the interaction between OSG and the site is relatively simple and easy to maintain. Most sites fall into this category.","title":"Standard Hosted CE"},{"location":"services/hosted-ce-definitions/#benefits_1","text":"Configuration is limited to basics, so there is less upfront and ongoing work for OSG and the site; OSG maintains and shares mappings from user groups to OSG user accounts on the site, so that the site can \u2014 if desired \u2014 limit resource allocations to certain groups; and OSG maintains the required OSG configuration on the site\u2019s head node and worker nodes (if the site provides a distribution mechanism to worker nodes, such as a shared file system).","title":"Benefits"},{"location":"services/hosted-ce-definitions/#site-responsibilities","text":"In addition to the general prerequisites above, the following apply to a Standard Hosted CE: The site must create and maintain 20 OSG user accounts on a single head node; note that: OSG will access their accounts via SSH using one RSA key for all 20 accounts; and All 20 OSG accounts must be able to submit to the local batch system. The site may control the resources allocated to different OSG user groups by writing and maintaining policies on the OSG user accounts within the batch system. The site provides privilege separation among the OSG user groups via the OSG user accounts and standard Unix privilege separation.","title":"Site Responsibilities"},{"location":"services/install-gwms-factory/","text":"GlideinWMS Factory Installation This document describes how to install a Glidein Workflow Managment System (GlideinWMS) Factory instance. This document assumes expertise with HTCondor and familiarity with the GlideinWMS software. It does not cover anything but the simplest possible install. Please consult the GlideinWMS reference documentation for advanced topics, including non-root, non-RPM-based installation. In this document the terms glidein and pilot (job) will be used interchangeably. This parts covers these primary components of the GlideinWMS system: WMS Collector / Schedd : A set of condor_collector and condor_schedd processes that allow the submission of pilots to Grid entries. GlideinWMS Factory : The process submitting the pilots when needed Warning We really recommend you to use the OSG provided Factory and not to install your own . A VO Frontend is sufficient to submit your jobs and to decide scheduling policies. And this will avoid for you the complexity to deal directly with grid/cloud sites. If you really need you own Factory be aware that it is a complex component and may require a non trivial maintenance effort. Before Starting Before starting the installation process, consider the following points (consulting the Reference section below as needed): Requirements Host and OS A host to install the GlideinWMS Factory (pristine node). Currently most of our testing has been done on Scientific Linux 6 and 7. Root access The GlideinWMS Factory has the following requirements: CPU : 4-8 cores for a large installation (1 should suffice on a small install) RAM : 4-8GB on a large installation (1GB should suffice for small installs) Disk : 10GB will be plenty sufficient for all the binaries, config and log files related to GlideinWMS. If you are a large site with need to keep significant history and logs, you may want to allocate 100GB+ to store long histories. Users The GlideinWMS Factory installation will create the following users unless they are already created . User Default uid Comment condor none HTCondor user (installed via dependencies). gfactory none This user runs the GlideinWMS VO factory. To verify that the user gfactory has gfactory as primary group check the output of root@host # getent passwd gfactory | cut -d: -f4 | xargs getent group It should be the gfactory group. Certificates Certificate User that owns certificate Path to certificate Host certificate root /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem Here are instructions to request a host certificate. The host certificate/key is used for authorization, however, authorization between the Factory and the GlideinWMS collector is done by file system authentication. Networking Firewalls It must be on the public internet, with at least one port open to the world; all worker nodes will load data from this node trough HTTP. Note that worker nodes will also need outbound access in order to access this HTTP port. Installation Procedure As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates Installing HTCondor Most required software is installed from the Factory RPM installation. HTCondor is the only exception since there are many different ways to install it , using the RPM system or not. You need to have HTCondor installed before installing the GlideinWMS Factory. If yum cannot find a HTCondor RPM, it will install the dummy empty-condor RPM, assuming that you installed HTCondor using a tarball distribution. If you don't have HTCondor already installed, you can install the HTCondor RPM from the OSG repository: root@host # yum install condor.x86_64 Installing HTCondor-BOSCO If you plan to send jobs using direct batch submission (aka BOSCO), then you need also the condor-bosco package. You'll have to install the package and remove one of its files /etc/condor/config.d/60-campus_factory.config because it interferes with the Factory configuration. root@host # yum install condor-bosco root@host # rm /etc/condor/config.d/60-campus_factory.config root@host # touch /etc/condor/config.d/60-campus_factory.config Install GWMS Factory Download and install the Factory RPM Install the RPM and dependencies (be prepared for a lot of dependencies). root@host # yum install glideinwms-factory This will install the current production release verified and tested by OSG with default HTCondor configuration. This command will install the GlideinWMS Factory, HTCondor, the OSG client, and all the required dependencies. If you wish to install a different version of GlideinWMS, add the \"--enablerepo\" argument to the command as follows: yum install --enablerepo=osg-testing glideinwms-factory : The most recent production release, still in testing phase. This will usually match the current tarball version on the GlideinWMS home page . (The osg-release production version may lag behind the tarball release by a few weeks as it is verified and packaged by OSG). Note that this will also take the osg-testing versions of all dependencies as well. yum install --enablerepo=osg-upcoming glideinwms-factory : The most recent development series release, ie version 3.3.x release. This has newer features such as cloud submission support, but is less tested. Download HTCondor tarballs You will need to download HTCondor tarballs for each architecture that you want to deploy pilots on . At this point, GlideinWMS factory does not support pulling HTCondor binaries from your system area. Suggested is that you put these binaries in /var/lib/gwms-factory/condor but any gfactory accessible location should suffice. Configuration Procedure After installing the RPM you need to configure the components of the GlideinWMS Factory: Edit Factory configuration options Edit HTCondor configuration options Create a HTCondor grid map file Reconfigure and Start Factory Configuring the Factory The configuration file is /etc/gwms-factory/glideinWMS.xml . The next steps will describe each line that you will need to edit for most cases, but you may want to review the whole file to be sure that it is configured correctly. Security configuration In the security section, you will need to provide each Frontend that is allowed to communicate with the Factory: security key_length=\"2048\" pub_key=\"RSA\" remove_old_cred_age=\"30\" remove_old_cred_freq=\"24\" reuse_oldkey_onstartup_gracetime=\"900\"> These attributes are very important to get exactly right or the Frontend will not be trusted. This should match one of the factory and security sections of the Frontend configuration Configuring the GlideinWMS Frontend in the following way: Note This is a snippet from the Frontend configuration (for reference), not the Factory that you are configuring now! For the factory section: # from frontend.xml .... For the security: # from frontend.xml Note that the identity of the Frontend must match what HTCondor authenticates the DN of the frontend to. In /etc/condor/certs/condor_mapfile , there must be an entry with vofrontend_service definition (in this case): GSI \"^\\/DC\\=org\\/DC\\=doegrids\\/OU\\=Services\\/CN\\=Some\\ Name\\ 834323%ENDCOLOR%$\" % GREEN % vofrontend_service % ENDCOLOR % Entry configuration Entries are grid/cloud endpoints (aka Compute Elements, or gatekeepers) that can accept job requests and run pilots (which will run user jobs). Each entry needs to be configured to communicate to a specific gatekeeper. An example test entry is provided in the default GlideinWMS configuration file. At the very least, you will need to modify the entry line: You will need to modify the entry name and gatekeeper . This will determine the gatekeeper that you access. Specific gatekeepers often require specific \"rsl\" attributes that determine the job queue that you are submitting to, or other attributes. Add them in the rsl attribute. Also, be sure to distribute your entries across the various HTCondor schedd work managers to balance load. To see the available schedd use condor_status -schedd -l | grep Name . Several schedd options are configured by default for you: schedd_glideins2, schedd_glideins3, schedd_glideins4, schedd_glideins5 , as well as the default schedd . This can be modified in the HTCondor configuration. Add any specific options, such as limitations on jobs/pilots or glexec/voms requirements in the entry section below the above line. More details are in the GlideinWMS Factory configuration guide . !!! warning If there is no match between auth_metod and trust_domain of the entry and the type and trust_domain listed in one of the credentials of one of the Frontends using this Factory, then no job can run on that entry. The Factory must advertise the correct Resource Name of each entry for accounting purposes. Then the Factory must also advertise in the entry all the attributes that will allow to match the query expression used in the Frontends connecting to this Factory (e.g. as explained in the VO frontend configuration document ). Note Keep an eye on this part as we're dealing with singularity. Then you must advertise correctly if the site supports gLExec . If it does not set GLEXEC_BIN to NONE , if gLExec is installed via OSG set it to OSG , otherwise set it to the path of gLExec. For example this snippet advertises GLIDEIN_Supported_VOs attribute with the supported VO so that can be used with the query above in the VO frontend and says that the resource does not support gLExec: ... ... Note Specially if jobs are sent to OSG resources, it is very important to set the GLIDEIN_Resource_Name and to be consistent with the Resource Name reported in OIM because that name will be used for job accounting in Gratia. It should be the name of the Resource in OIM or the name of the Resource Group (specially if there are many gatekeepers submitting to the same cluster). More information on options can be found here Configuring Tarballs Each pilot will download HTCondor binaries from the staging area. Often, multiple binaries are needed to support various architectures and platforms. Currently, you will need to provide at least one tarball for GlideinWMS to use. (Using the system binaries is currently not supported). Download a HTCondor tarball from here . Suggested is to put the binaries in /var/lib/gwms-factory/condor , but any factory-accessible location will do just fine. Once you have downloaded the tarball, configure it in /etc/gwms-factory/glideinWMS.xml like in the following: Remember also to modify the condor_os and condor_arch attributes in the entries (the configured Compute Elements) to pick the correct HTCondor binary. Here are more details on using multiple HTCondor binaries. Note that is sufficient to set the base_dir ; the reconfigure command will prepare the tarball and add it to the XML config file. Configuring HTCondor The HTCondor configuration for the Factory is placed in /etc/condor/config.d . 00_gwms_factory_general.config 00-restart_peaceful.config 01_gwms_factory_collectors.config 02_gwms_factory_schedds.config 03_gwms_local.config 10-batch_gahp_blahp.config Get rid of the pre-loaded HTCondor default root@host # rm /etc/condor/config.d/00personal_condor.config root@host # touch /etc/condor/config.d/00personal_condor.config For most installations, the items you need to modify are in 03_gwms_factory_local.config . The lines you will have to edit are: Credentials of the machine. You can either run using a proxy, or a service certificate. It is recommended to use a host certificate and specify its location in the variables GSI_DAEMON_CERT and GSI_DAEMON_KEY . The host certificate should be owned by root and have the correct permissions, 600. HTCondor ids in the form UID.GID (both are integers) HTCondor admin email. Will receive messages when services fail. #-- HTCondor user: condor CONDOR_IDS = #-- Contact (via email) when problems occur CONDOR_ADMIN = ############################ # GSI Security config ############################ #-- Grid Certificate directory GSI_DAEMON_TRUSTED_CA_DIR= /etc/grid-security/certificates #-- Credentials GSI_DAEMON_CERT = /etc/grid-security/hostcert.pem GSI_DAEMON_KEY = /etc/grid-security/hostkey.pem #-- HTCondor mapfile CERTIFICATE_MAPFILE= /etc/condor/certs/condor_mapfile ################################### # Whitelist of HTCondor daemon DNs ################################### #DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD Using other HTCondor RPMs, e.g. UW Madison HTCondor RPM The above procedure will work if you are using the OSG HTCondor RPMS. You can verify that you used the OSG HTCondor RPM by using yum list condor . The version name should include \"osg\", e.g. 8.6.9-1.1.osg34.el7 . If you are using the UW Madison HTCondor RPMS, be aware of the following changes: This HTCondor RPM uses a file /etc/condor/condor_config.local to add your local machine slot to the user pool. If you want to disable this behavior (recommended), you should blank out that file or comment out the line in /etc/condor/condor_config for LOCAL_CONFIG_FILE. (Make sure that LOCAL_CONFIG_DIR is set to /etc/condor/config.d ) Note that the variable LOCAL_DIR is set differently in UW Madison and OSG RPMs. This should not cause any more problems in the Glideinwms RPMs, but please take note if you use this variable in your job submissions or other customizations. In general if you are using a non OSG RPM or if you added custom configuration files for HTCondor please check the order of the configuration files: root@host # condor_config_val -config Configuration source: /etc/condor/condor_config Local configuration sources: /etc/condor/config.d/00-restart_peaceful.config /etc/condor/config.d/00_gwms_factory_general.config /etc/condor/config.d/01_gwms_factory_collectors.config /etc/condor/config.d/02_gwms_factory_schedds.config /etc/condor/config.d/03_gwms_local.config /etc/condor/config.d/10-batch_gahp_blahp.config /etc/condor/condor_config.local Restarting HTCondor After configuring HTCondor, be sure to restart HTCondor: root@host # service condor restart Create a HTCondor grid mapfile. The HTCondor grid mapfile /etc/condor/certs/condor_mapfile is used for authentication between the glidein running on a remote worker node, and the local collector. HTCondor uses the mapfile to map certificates to pseudo-users on the local machine. It is important that you map the DN's of each frontend you are talking to. Below is an example mapfile, by default found in /etc/condor/certs/condor_mapfile : GSI \"^\\/DC\\=org\\/DC\\=doegrids\\/OU\\=People\\/CN\\=Some\\ Name\\ 123456$\" frontend GSI (.*) anonymous FS (.*) \\1 Each frontend needs a line that maps to the user specified in the identity argument in the frontend security section of the Factory configuration. Reconfiguring GlideinWMS After changing the configuration of GlideinWMS and making sure that Factory is running, use the following table to find the appropriate command for your operating system (run as root ): If your operating system is... Run the following command... Enterprise Linux 7 systemctl reload gwms-factory Enterprise Linux 6 service gwms-factory reconfig Note Notice that, in the case of Enterprise Linux 7 systemctl reload gwms-factory will work only if: - gwms-factory service is running - gwms-factory service was started with systemctl Otherwise, you will get the following error in any of the cases: # systemctl reload gwms-factory Job for gwms-factory.service invalid. Upgrading GlideinWMS Before you start the Factory service for the first time or after an update of the RPM or after you change GlideinWMS scripts, you should always use the GlideinWMS \"upgrade\" command. To do so: Make sure the condor and gwms-factory services are stopped (in EL6 this will be done for you). Issue the upgrade command: If you are using Enterprise Linux 7: root@host # /usr/sbin/gwms-factory upgrade If you are using Enterprise Linux 6: root@host # service gwms-factory upgrade Start the condor and gwms-factory services (see next part). Service Activation and Deactivation To start the Factory you must start also HTCondor and the Web server beside the Factory itself: # %RED%For RHEL 6 , CentOS 6 , and SL6%ENDCOLOR% root@host # service condor start root@host # service httpd start root@host # service gwms-factory start # %RED% For RHEL 7 , CentOS 7 , and SL7%ENDCOLOR% root@host # systemctl start condor root@host # systemctl start httpd root@host # systemctl start gwms-factory Note Once you successfully start using the Factory service, anytime you change the /etc/gwms-factory/glideinWMS.xml file you will need to run a reconfig/reload command. If you change also some code you need the upgrade command mentioned above: # %RED% For RHEL 6 , CentOS 6 , and SL6%ENDCOLOR% root@host # service gwms-factory reconfig # %RED% But the situation is a bit more complicated in RHEL 7 , CentOS 7 , and SL7 due to systemd restrictions%ENDCOLOR% # %GREEN% For reconfig:%ENDCOLOR% A. %RED% when the Factory is running%ENDCOLOR% A.1 %RED% without any additional options%ENDCOLOR% root@host # /usr/sbin/gwms-factory reconfig%ENDCOLOR% or root@host # systemctl reload gwms-factory A.2 %RED% if you want to give additional options %ENDCOLOR% systemctl stop gwms-factory /usr/sbin/gwms-factory reconfig \"and your options\" systemctl start gwms-factory B. %RED% when the Factory is NOT running %ENDCOLOR% root@host # /usr/sbin/gwms-factory reconfig ( \"and your options\" ) To enable the services so that they restart after a reboot: # %RED%# For RHEL 6 , CentOS 6 , and SL6%ENDCOLOR% root@host # /sbin/chkconfig fetch-crl-cron on root@host # /sbin/chkconfig fetch-crl-boot on root@host # /sbin/chkconfig condor on root@host # /sbin/chkconfig httpd on root@host # /sbin/chkconfig gwms-factory on # %RED%# For RHEL 7 , CentOS 7 , and SL7%ENDCOLOR% root@host # systemctl enable fetch-crl-cron root@host # systemctl enable fetch-crl-boot root@host # systemctl enable condor root@host # systemctl enable httpd root@host # systemctl enable gwms-factory To stop the Factory: # %RED%For RHEL 6 , CentOS 6 , and SL6 %ENDCOLOR% root@host # service gwms-factory stop # %RED%For RHEL 7 , CentOS 7 , and SL7%ENDCOLOR% root@host # systemctl stop gwms-factory And you can stop also the other services if you are not using them independently of the Factory. Validating GlideinWMS Factory The complete validation of the Factory is the submission of actual jobs. You can also check that the services are up and running: root@host # condor_status -any MyType TargetType Name glidefactoryclient None 12345_TEST_ENTRY@gfactory_instance@ glideclient None 12345_TEST_ENTRY@gfactory_instance@ glidefactory None TEST_ENTRY@gfactory_instance@ glidefactoryglobal None gfactory_instance@gfactory_ser glideclientglobal None gfactory_instance@gfactory_ser Scheduler None hostname.fnal.gov DaemonMaster None hostname.fnal.gov Negotiator None hostname.fnal.gov Scheduler None schedd_glideins2@hostname Scheduler None schedd_glideins3@hostname Scheduler None schedd_glideins4@hostname Scheduler None schedd_glideins5@hostname Collector None wmscollector_service@hostname You should have one \"glidefactory\" classAd for each entry that you have enabled. If you have already configured the frontends, you will also have one glidefactoryclient and one glideclient classAd for each frontend / entry. You can check also the monitoring Web page: http://YOUR_HOST_FQDN/factory/monitor/ You can also test the local submission of a job to a resource using the test script local_start.sh but you must first install the OSG client tools and generate a proxy. After that you can run the test (replace ENTRY_NAME with the name of one of the entries in /etc/gwms-factory/glideinWMS.xml ): Check Web server configuration for the monitoring Verify path and specially the URL for the GlideinWMS files served by your web server: stage base_dir = \"/var/lib/gwms-factory/web-area/stage\" use_symlink = \"True\" web_base_url = \"http://HOSTNAME:PORT/factory/stage\" This will determine the location of your web server . Make sure that the URL is visible. Depending on your firewall or the one of your organization, you may need to change the port here and in the httpd configuration (by modifying the \"Listen\" directive in /etc/httpd/conf/httpd.conf ). Note that web servers are an often an attacked piece of infrastruture, so you may want to go through the Apache configuration in /etc/httpd/conf/httpd.conf and disable unneeded modules. Troubleshooting GlideinWMS Factory File Locations File Description File Location Comment Configuration file /etc/gwms-factory/glideinWMS.xml Main configuration file Logs /var/log/gwms-factory/server/factory Overall server logs /var/log/gwms-factory/server/entry_NAME Specific entry logs (generally more useful) /var/log/gwms-factory/client Glidein Pilot logs seperated by user and entry Startup script /etc/init.d/gwms-factory Web Directory /var/lib/gwms-factory/web-area Web Base /var/lib/gwms-factory/web-base Working Directory /var/lib/gwms-factory/work-dir/ Increase the log level and change rotation policies You can increase the log level of the frontend. To add a log file with all the log information add the following line with all the message types in the process_log section of /etc/gwms-factory/glideinWMS.xml : You can also change the rotation policy and choose whether compress the rotated files, all in the same section of the config files: max_bytes is the max size of the log files max_days it will be rotated. compression specifies if rotated files are compressed backup_count is the number of rotated log files kept Further details are in the reference documentation . Failed authentication errors If you get messages such as these in the logs, the Factory does not trust the frontend and will not submit glideins. WARNING: Client fermicloud128-fnal-gov_OSG_gWMSFrontend.main (secid: frontend_name) not in white list. Skipping request This error means that the frontend name in the security section of the Factory does not match the security_name in the frontend. Client fermicloud128-fnal-gov_OSG_gWMSFrontend.main (secid: frontend_name) is not coming from a trusted source; AuthenticatedIdentity vofrontend_condor@fermicloud130.fnal.gov!=vofrontend_factory@fermicloud130.fnal.gov. Skipping for security reasons. This error means that the identity in the security section of the Factory does not match what the /etc/condor/certs/condor_mapfile authenticates the Frontend to in HTCondor (!Authenticated Identity in the classad). Make sure the attributes are correctly lined up as in the Frontend security configuration section above. Glideins start but do not connect to User pool / VO Frontend Check the appropriate job err and out logs in /var/log/gwms-factory/client to see if any errors were reported. Often, this will be a pilot unable to access a web server or with an invalid proxy. Also, verify that the condor_mapfile is correct on the VO Frontend's user pool collector and configuration. Glideins start but fail before running job with error \"Proxy not long lived enough\" If the glideins are running on a resource (entry) but the jobs are not running and the log files in /var/log/gwms-factory/client/user_frontend/glidein_gfactory_instance/ENTRY_NAME report an error like \"Proxy not long lived enough (86096 s left), shortened retire time ...\", then probably the HTCondor RLM on the Compute Element is delegating the proxy and shortening its lifespan. This can be fixed by setting DELEGATE_JOB_GSI_CREDENTIALS = FALSE as suggested in the CE install document . References http://glideinwms.fnal.gov/doc.prd/ https://opensciencegrid.org/docs/other/install-gwms-frontend/","title":"Installing GlideinWMS Factory"},{"location":"services/install-gwms-factory/#glideinwms-factory-installation","text":"This document describes how to install a Glidein Workflow Managment System (GlideinWMS) Factory instance. This document assumes expertise with HTCondor and familiarity with the GlideinWMS software. It does not cover anything but the simplest possible install. Please consult the GlideinWMS reference documentation for advanced topics, including non-root, non-RPM-based installation. In this document the terms glidein and pilot (job) will be used interchangeably. This parts covers these primary components of the GlideinWMS system: WMS Collector / Schedd : A set of condor_collector and condor_schedd processes that allow the submission of pilots to Grid entries. GlideinWMS Factory : The process submitting the pilots when needed Warning We really recommend you to use the OSG provided Factory and not to install your own . A VO Frontend is sufficient to submit your jobs and to decide scheduling policies. And this will avoid for you the complexity to deal directly with grid/cloud sites. If you really need you own Factory be aware that it is a complex component and may require a non trivial maintenance effort.","title":"GlideinWMS Factory Installation"},{"location":"services/install-gwms-factory/#before-starting","text":"Before starting the installation process, consider the following points (consulting the Reference section below as needed):","title":"Before Starting"},{"location":"services/install-gwms-factory/#requirements","text":"","title":"Requirements"},{"location":"services/install-gwms-factory/#host-and-os","text":"A host to install the GlideinWMS Factory (pristine node). Currently most of our testing has been done on Scientific Linux 6 and 7. Root access The GlideinWMS Factory has the following requirements: CPU : 4-8 cores for a large installation (1 should suffice on a small install) RAM : 4-8GB on a large installation (1GB should suffice for small installs) Disk : 10GB will be plenty sufficient for all the binaries, config and log files related to GlideinWMS. If you are a large site with need to keep significant history and logs, you may want to allocate 100GB+ to store long histories.","title":"Host and OS"},{"location":"services/install-gwms-factory/#users","text":"The GlideinWMS Factory installation will create the following users unless they are already created . User Default uid Comment condor none HTCondor user (installed via dependencies). gfactory none This user runs the GlideinWMS VO factory. To verify that the user gfactory has gfactory as primary group check the output of root@host # getent passwd gfactory | cut -d: -f4 | xargs getent group It should be the gfactory group.","title":"Users"},{"location":"services/install-gwms-factory/#certificates","text":"Certificate User that owns certificate Path to certificate Host certificate root /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem Here are instructions to request a host certificate. The host certificate/key is used for authorization, however, authorization between the Factory and the GlideinWMS collector is done by file system authentication.","title":"Certificates"},{"location":"services/install-gwms-factory/#networking","text":"","title":"Networking"},{"location":"services/install-gwms-factory/#firewalls","text":"It must be on the public internet, with at least one port open to the world; all worker nodes will load data from this node trough HTTP. Note that worker nodes will also need outbound access in order to access this HTTP port.","title":"Firewalls"},{"location":"services/install-gwms-factory/#installation-procedure","text":"As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates","title":"Installation Procedure"},{"location":"services/install-gwms-factory/#installing-htcondor","text":"Most required software is installed from the Factory RPM installation. HTCondor is the only exception since there are many different ways to install it , using the RPM system or not. You need to have HTCondor installed before installing the GlideinWMS Factory. If yum cannot find a HTCondor RPM, it will install the dummy empty-condor RPM, assuming that you installed HTCondor using a tarball distribution. If you don't have HTCondor already installed, you can install the HTCondor RPM from the OSG repository: root@host # yum install condor.x86_64","title":"Installing HTCondor"},{"location":"services/install-gwms-factory/#installing-htcondor-bosco","text":"If you plan to send jobs using direct batch submission (aka BOSCO), then you need also the condor-bosco package. You'll have to install the package and remove one of its files /etc/condor/config.d/60-campus_factory.config because it interferes with the Factory configuration. root@host # yum install condor-bosco root@host # rm /etc/condor/config.d/60-campus_factory.config root@host # touch /etc/condor/config.d/60-campus_factory.config","title":"Installing HTCondor-BOSCO"},{"location":"services/install-gwms-factory/#install-gwms-factory","text":"","title":"Install GWMS Factory"},{"location":"services/install-gwms-factory/#download-and-install-the-factory-rpm","text":"Install the RPM and dependencies (be prepared for a lot of dependencies). root@host # yum install glideinwms-factory This will install the current production release verified and tested by OSG with default HTCondor configuration. This command will install the GlideinWMS Factory, HTCondor, the OSG client, and all the required dependencies. If you wish to install a different version of GlideinWMS, add the \"--enablerepo\" argument to the command as follows: yum install --enablerepo=osg-testing glideinwms-factory : The most recent production release, still in testing phase. This will usually match the current tarball version on the GlideinWMS home page . (The osg-release production version may lag behind the tarball release by a few weeks as it is verified and packaged by OSG). Note that this will also take the osg-testing versions of all dependencies as well. yum install --enablerepo=osg-upcoming glideinwms-factory : The most recent development series release, ie version 3.3.x release. This has newer features such as cloud submission support, but is less tested.","title":"Download and install the Factory RPM"},{"location":"services/install-gwms-factory/#download-htcondor-tarballs","text":"You will need to download HTCondor tarballs for each architecture that you want to deploy pilots on . At this point, GlideinWMS factory does not support pulling HTCondor binaries from your system area. Suggested is that you put these binaries in /var/lib/gwms-factory/condor but any gfactory accessible location should suffice.","title":"Download HTCondor tarballs"},{"location":"services/install-gwms-factory/#configuration-procedure","text":"After installing the RPM you need to configure the components of the GlideinWMS Factory: Edit Factory configuration options Edit HTCondor configuration options Create a HTCondor grid map file Reconfigure and Start Factory","title":"Configuration Procedure"},{"location":"services/install-gwms-factory/#configuring-the-factory","text":"The configuration file is /etc/gwms-factory/glideinWMS.xml . The next steps will describe each line that you will need to edit for most cases, but you may want to review the whole file to be sure that it is configured correctly.","title":"Configuring the Factory"},{"location":"services/install-gwms-factory/#security-configuration","text":"In the security section, you will need to provide each Frontend that is allowed to communicate with the Factory: security key_length=\"2048\" pub_key=\"RSA\" remove_old_cred_age=\"30\" remove_old_cred_freq=\"24\" reuse_oldkey_onstartup_gracetime=\"900\"> These attributes are very important to get exactly right or the Frontend will not be trusted. This should match one of the factory and security sections of the Frontend configuration Configuring the GlideinWMS Frontend in the following way: Note This is a snippet from the Frontend configuration (for reference), not the Factory that you are configuring now! For the factory section: # from frontend.xml .... For the security: # from frontend.xml Note that the identity of the Frontend must match what HTCondor authenticates the DN of the frontend to. In /etc/condor/certs/condor_mapfile , there must be an entry with vofrontend_service definition (in this case): GSI \"^\\/DC\\=org\\/DC\\=doegrids\\/OU\\=Services\\/CN\\=Some\\ Name\\ 834323%ENDCOLOR%$\" % GREEN % vofrontend_service % ENDCOLOR %","title":"Security configuration"},{"location":"services/install-gwms-factory/#entry-configuration","text":"Entries are grid/cloud endpoints (aka Compute Elements, or gatekeepers) that can accept job requests and run pilots (which will run user jobs). Each entry needs to be configured to communicate to a specific gatekeeper. An example test entry is provided in the default GlideinWMS configuration file. At the very least, you will need to modify the entry line: You will need to modify the entry name and gatekeeper . This will determine the gatekeeper that you access. Specific gatekeepers often require specific \"rsl\" attributes that determine the job queue that you are submitting to, or other attributes. Add them in the rsl attribute. Also, be sure to distribute your entries across the various HTCondor schedd work managers to balance load. To see the available schedd use condor_status -schedd -l | grep Name . Several schedd options are configured by default for you: schedd_glideins2, schedd_glideins3, schedd_glideins4, schedd_glideins5 , as well as the default schedd . This can be modified in the HTCondor configuration. Add any specific options, such as limitations on jobs/pilots or glexec/voms requirements in the entry section below the above line. More details are in the GlideinWMS Factory configuration guide . !!! warning If there is no match between auth_metod and trust_domain of the entry and the type and trust_domain listed in one of the credentials of one of the Frontends using this Factory, then no job can run on that entry. The Factory must advertise the correct Resource Name of each entry for accounting purposes. Then the Factory must also advertise in the entry all the attributes that will allow to match the query expression used in the Frontends connecting to this Factory (e.g. as explained in the VO frontend configuration document ). Note Keep an eye on this part as we're dealing with singularity. Then you must advertise correctly if the site supports gLExec . If it does not set GLEXEC_BIN to NONE , if gLExec is installed via OSG set it to OSG , otherwise set it to the path of gLExec. For example this snippet advertises GLIDEIN_Supported_VOs attribute with the supported VO so that can be used with the query above in the VO frontend and says that the resource does not support gLExec: ... ... Note Specially if jobs are sent to OSG resources, it is very important to set the GLIDEIN_Resource_Name and to be consistent with the Resource Name reported in OIM because that name will be used for job accounting in Gratia. It should be the name of the Resource in OIM or the name of the Resource Group (specially if there are many gatekeepers submitting to the same cluster). More information on options can be found here","title":"Entry configuration"},{"location":"services/install-gwms-factory/#configuring-tarballs","text":"Each pilot will download HTCondor binaries from the staging area. Often, multiple binaries are needed to support various architectures and platforms. Currently, you will need to provide at least one tarball for GlideinWMS to use. (Using the system binaries is currently not supported). Download a HTCondor tarball from here . Suggested is to put the binaries in /var/lib/gwms-factory/condor , but any factory-accessible location will do just fine. Once you have downloaded the tarball, configure it in /etc/gwms-factory/glideinWMS.xml like in the following: Remember also to modify the condor_os and condor_arch attributes in the entries (the configured Compute Elements) to pick the correct HTCondor binary. Here are more details on using multiple HTCondor binaries. Note that is sufficient to set the base_dir ; the reconfigure command will prepare the tarball and add it to the XML config file.","title":"Configuring Tarballs"},{"location":"services/install-gwms-factory/#configuring-htcondor","text":"The HTCondor configuration for the Factory is placed in /etc/condor/config.d . 00_gwms_factory_general.config 00-restart_peaceful.config 01_gwms_factory_collectors.config 02_gwms_factory_schedds.config 03_gwms_local.config 10-batch_gahp_blahp.config Get rid of the pre-loaded HTCondor default root@host # rm /etc/condor/config.d/00personal_condor.config root@host # touch /etc/condor/config.d/00personal_condor.config For most installations, the items you need to modify are in 03_gwms_factory_local.config . The lines you will have to edit are: Credentials of the machine. You can either run using a proxy, or a service certificate. It is recommended to use a host certificate and specify its location in the variables GSI_DAEMON_CERT and GSI_DAEMON_KEY . The host certificate should be owned by root and have the correct permissions, 600. HTCondor ids in the form UID.GID (both are integers) HTCondor admin email. Will receive messages when services fail. #-- HTCondor user: condor CONDOR_IDS = #-- Contact (via email) when problems occur CONDOR_ADMIN = ############################ # GSI Security config ############################ #-- Grid Certificate directory GSI_DAEMON_TRUSTED_CA_DIR= /etc/grid-security/certificates #-- Credentials GSI_DAEMON_CERT = /etc/grid-security/hostcert.pem GSI_DAEMON_KEY = /etc/grid-security/hostkey.pem #-- HTCondor mapfile CERTIFICATE_MAPFILE= /etc/condor/certs/condor_mapfile ################################### # Whitelist of HTCondor daemon DNs ################################### #DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD","title":"Configuring HTCondor"},{"location":"services/install-gwms-factory/#using-other-htcondor-rpms-eg-uw-madison-htcondor-rpm","text":"The above procedure will work if you are using the OSG HTCondor RPMS. You can verify that you used the OSG HTCondor RPM by using yum list condor . The version name should include \"osg\", e.g. 8.6.9-1.1.osg34.el7 . If you are using the UW Madison HTCondor RPMS, be aware of the following changes: This HTCondor RPM uses a file /etc/condor/condor_config.local to add your local machine slot to the user pool. If you want to disable this behavior (recommended), you should blank out that file or comment out the line in /etc/condor/condor_config for LOCAL_CONFIG_FILE. (Make sure that LOCAL_CONFIG_DIR is set to /etc/condor/config.d ) Note that the variable LOCAL_DIR is set differently in UW Madison and OSG RPMs. This should not cause any more problems in the Glideinwms RPMs, but please take note if you use this variable in your job submissions or other customizations. In general if you are using a non OSG RPM or if you added custom configuration files for HTCondor please check the order of the configuration files: root@host # condor_config_val -config Configuration source: /etc/condor/condor_config Local configuration sources: /etc/condor/config.d/00-restart_peaceful.config /etc/condor/config.d/00_gwms_factory_general.config /etc/condor/config.d/01_gwms_factory_collectors.config /etc/condor/config.d/02_gwms_factory_schedds.config /etc/condor/config.d/03_gwms_local.config /etc/condor/config.d/10-batch_gahp_blahp.config /etc/condor/condor_config.local","title":"Using other HTCondor RPMs, e.g. UW Madison HTCondor RPM"},{"location":"services/install-gwms-factory/#restarting-htcondor","text":"After configuring HTCondor, be sure to restart HTCondor: root@host # service condor restart","title":"Restarting HTCondor"},{"location":"services/install-gwms-factory/#create-a-htcondor-grid-mapfile","text":"The HTCondor grid mapfile /etc/condor/certs/condor_mapfile is used for authentication between the glidein running on a remote worker node, and the local collector. HTCondor uses the mapfile to map certificates to pseudo-users on the local machine. It is important that you map the DN's of each frontend you are talking to. Below is an example mapfile, by default found in /etc/condor/certs/condor_mapfile : GSI \"^\\/DC\\=org\\/DC\\=doegrids\\/OU\\=People\\/CN\\=Some\\ Name\\ 123456$\" frontend GSI (.*) anonymous FS (.*) \\1 Each frontend needs a line that maps to the user specified in the identity argument in the frontend security section of the Factory configuration.","title":"Create a HTCondor grid mapfile."},{"location":"services/install-gwms-factory/#reconfiguring-glideinwms","text":"After changing the configuration of GlideinWMS and making sure that Factory is running, use the following table to find the appropriate command for your operating system (run as root ): If your operating system is... Run the following command... Enterprise Linux 7 systemctl reload gwms-factory Enterprise Linux 6 service gwms-factory reconfig Note Notice that, in the case of Enterprise Linux 7 systemctl reload gwms-factory will work only if: - gwms-factory service is running - gwms-factory service was started with systemctl Otherwise, you will get the following error in any of the cases: # systemctl reload gwms-factory Job for gwms-factory.service invalid.","title":"Reconfiguring GlideinWMS"},{"location":"services/install-gwms-factory/#upgrading-glideinwms","text":"Before you start the Factory service for the first time or after an update of the RPM or after you change GlideinWMS scripts, you should always use the GlideinWMS \"upgrade\" command. To do so: Make sure the condor and gwms-factory services are stopped (in EL6 this will be done for you). Issue the upgrade command: If you are using Enterprise Linux 7: root@host # /usr/sbin/gwms-factory upgrade If you are using Enterprise Linux 6: root@host # service gwms-factory upgrade Start the condor and gwms-factory services (see next part).","title":"Upgrading GlideinWMS"},{"location":"services/install-gwms-factory/#service-activation-and-deactivation","text":"To start the Factory you must start also HTCondor and the Web server beside the Factory itself: # %RED%For RHEL 6 , CentOS 6 , and SL6%ENDCOLOR% root@host # service condor start root@host # service httpd start root@host # service gwms-factory start # %RED% For RHEL 7 , CentOS 7 , and SL7%ENDCOLOR% root@host # systemctl start condor root@host # systemctl start httpd root@host # systemctl start gwms-factory Note Once you successfully start using the Factory service, anytime you change the /etc/gwms-factory/glideinWMS.xml file you will need to run a reconfig/reload command. If you change also some code you need the upgrade command mentioned above: # %RED% For RHEL 6 , CentOS 6 , and SL6%ENDCOLOR% root@host # service gwms-factory reconfig # %RED% But the situation is a bit more complicated in RHEL 7 , CentOS 7 , and SL7 due to systemd restrictions%ENDCOLOR% # %GREEN% For reconfig:%ENDCOLOR% A. %RED% when the Factory is running%ENDCOLOR% A.1 %RED% without any additional options%ENDCOLOR% root@host # /usr/sbin/gwms-factory reconfig%ENDCOLOR% or root@host # systemctl reload gwms-factory A.2 %RED% if you want to give additional options %ENDCOLOR% systemctl stop gwms-factory /usr/sbin/gwms-factory reconfig \"and your options\" systemctl start gwms-factory B. %RED% when the Factory is NOT running %ENDCOLOR% root@host # /usr/sbin/gwms-factory reconfig ( \"and your options\" ) To enable the services so that they restart after a reboot: # %RED%# For RHEL 6 , CentOS 6 , and SL6%ENDCOLOR% root@host # /sbin/chkconfig fetch-crl-cron on root@host # /sbin/chkconfig fetch-crl-boot on root@host # /sbin/chkconfig condor on root@host # /sbin/chkconfig httpd on root@host # /sbin/chkconfig gwms-factory on # %RED%# For RHEL 7 , CentOS 7 , and SL7%ENDCOLOR% root@host # systemctl enable fetch-crl-cron root@host # systemctl enable fetch-crl-boot root@host # systemctl enable condor root@host # systemctl enable httpd root@host # systemctl enable gwms-factory To stop the Factory: # %RED%For RHEL 6 , CentOS 6 , and SL6 %ENDCOLOR% root@host # service gwms-factory stop # %RED%For RHEL 7 , CentOS 7 , and SL7%ENDCOLOR% root@host # systemctl stop gwms-factory And you can stop also the other services if you are not using them independently of the Factory.","title":"Service Activation and Deactivation"},{"location":"services/install-gwms-factory/#validating-glideinwms-factory","text":"The complete validation of the Factory is the submission of actual jobs. You can also check that the services are up and running: root@host # condor_status -any MyType TargetType Name glidefactoryclient None 12345_TEST_ENTRY@gfactory_instance@ glideclient None 12345_TEST_ENTRY@gfactory_instance@ glidefactory None TEST_ENTRY@gfactory_instance@ glidefactoryglobal None gfactory_instance@gfactory_ser glideclientglobal None gfactory_instance@gfactory_ser Scheduler None hostname.fnal.gov DaemonMaster None hostname.fnal.gov Negotiator None hostname.fnal.gov Scheduler None schedd_glideins2@hostname Scheduler None schedd_glideins3@hostname Scheduler None schedd_glideins4@hostname Scheduler None schedd_glideins5@hostname Collector None wmscollector_service@hostname You should have one \"glidefactory\" classAd for each entry that you have enabled. If you have already configured the frontends, you will also have one glidefactoryclient and one glideclient classAd for each frontend / entry. You can check also the monitoring Web page: http://YOUR_HOST_FQDN/factory/monitor/ You can also test the local submission of a job to a resource using the test script local_start.sh but you must first install the OSG client tools and generate a proxy. After that you can run the test (replace ENTRY_NAME with the name of one of the entries in /etc/gwms-factory/glideinWMS.xml ):","title":"Validating GlideinWMS Factory"},{"location":"services/install-gwms-factory/#check-web-server-configuration-for-the-monitoring","text":"Verify path and specially the URL for the GlideinWMS files served by your web server: stage base_dir = \"/var/lib/gwms-factory/web-area/stage\" use_symlink = \"True\" web_base_url = \"http://HOSTNAME:PORT/factory/stage\" This will determine the location of your web server . Make sure that the URL is visible. Depending on your firewall or the one of your organization, you may need to change the port here and in the httpd configuration (by modifying the \"Listen\" directive in /etc/httpd/conf/httpd.conf ). Note that web servers are an often an attacked piece of infrastruture, so you may want to go through the Apache configuration in /etc/httpd/conf/httpd.conf and disable unneeded modules.","title":"Check Web server configuration for the monitoring"},{"location":"services/install-gwms-factory/#troubleshooting-glideinwms-factory","text":"","title":"Troubleshooting GlideinWMS Factory"},{"location":"services/install-gwms-factory/#file-locations","text":"File Description File Location Comment Configuration file /etc/gwms-factory/glideinWMS.xml Main configuration file Logs /var/log/gwms-factory/server/factory Overall server logs /var/log/gwms-factory/server/entry_NAME Specific entry logs (generally more useful) /var/log/gwms-factory/client Glidein Pilot logs seperated by user and entry Startup script /etc/init.d/gwms-factory Web Directory /var/lib/gwms-factory/web-area Web Base /var/lib/gwms-factory/web-base Working Directory /var/lib/gwms-factory/work-dir/","title":"File Locations"},{"location":"services/install-gwms-factory/#increase-the-log-level-and-change-rotation-policies","text":"You can increase the log level of the frontend. To add a log file with all the log information add the following line with all the message types in the process_log section of /etc/gwms-factory/glideinWMS.xml : You can also change the rotation policy and choose whether compress the rotated files, all in the same section of the config files: max_bytes is the max size of the log files max_days it will be rotated. compression specifies if rotated files are compressed backup_count is the number of rotated log files kept Further details are in the reference documentation .","title":"Increase the log level and change rotation policies"},{"location":"services/install-gwms-factory/#failed-authentication-errors","text":"If you get messages such as these in the logs, the Factory does not trust the frontend and will not submit glideins. WARNING: Client fermicloud128-fnal-gov_OSG_gWMSFrontend.main (secid: frontend_name) not in white list. Skipping request This error means that the frontend name in the security section of the Factory does not match the security_name in the frontend. Client fermicloud128-fnal-gov_OSG_gWMSFrontend.main (secid: frontend_name) is not coming from a trusted source; AuthenticatedIdentity vofrontend_condor@fermicloud130.fnal.gov!=vofrontend_factory@fermicloud130.fnal.gov. Skipping for security reasons. This error means that the identity in the security section of the Factory does not match what the /etc/condor/certs/condor_mapfile authenticates the Frontend to in HTCondor (!Authenticated Identity in the classad). Make sure the attributes are correctly lined up as in the Frontend security configuration section above.","title":"Failed authentication errors"},{"location":"services/install-gwms-factory/#glideins-start-but-do-not-connect-to-user-pool-vo-frontend","text":"Check the appropriate job err and out logs in /var/log/gwms-factory/client to see if any errors were reported. Often, this will be a pilot unable to access a web server or with an invalid proxy. Also, verify that the condor_mapfile is correct on the VO Frontend's user pool collector and configuration.","title":"Glideins start but do not connect to User pool / VO Frontend"},{"location":"services/install-gwms-factory/#glideins-start-but-fail-before-running-job-with-error-proxy-not-long-lived-enough","text":"If the glideins are running on a resource (entry) but the jobs are not running and the log files in /var/log/gwms-factory/client/user_frontend/glidein_gfactory_instance/ENTRY_NAME report an error like \"Proxy not long lived enough (86096 s left), shortened retire time ...\", then probably the HTCondor RLM on the Compute Element is delegating the proxy and shortening its lifespan. This can be fixed by setting DELEGATE_JOB_GSI_CREDENTIALS = FALSE as suggested in the CE install document .","title":"Glideins start but fail before running job with error \"Proxy not long lived enough\""},{"location":"services/install-gwms-factory/#references","text":"http://glideinwms.fnal.gov/doc.prd/ https://opensciencegrid.org/docs/other/install-gwms-frontend/","title":"References"},{"location":"services/sending-announcements/","text":"Sending Announcements Various OSG teams need to send out announcement about various events (releases, security advisories, planned changes, etc). This page describes how to send announcements using the osg-notify tool. Prerequisites To send announcements, the following conditions must be met: A host with an IP address listed in the SPF Record A sufficiently modern Linux operating system. This procedure has been tested on a FermiCloud Scientific Linux 7 VM and a Linux Mint 18.3 laptop. It is known not to work on a FermiCloud Scientific Linux 6 VM. A valid OSG user certificate to lookup contacts in the topology database Local hostname matches DNS DNS forward and reverse lookups in place [tim@submit-1 topology]$ hostname submit-1.chtc.wisc.edu [tim@submit-1 topology]$ host submit-1.chtc.wisc.edu submit-1.chtc.wisc.edu has address 128.105.244.191 [tim@submit-1 topology]$ host 128 .105.244.191 191.244.105.128.in-addr.arpa domain name pointer submit-1.chtc.wisc.edu. (Required for security announcements) A GPG Key to sign the announcement Installation Install the required Yum repositories : Install the OSG tools: # yum install --enablerepo = devops topology-client If you are on a FermiCloud VM, update postfix to relay through FermiLab's official mail server: echo \"transport_maps = hash:/etc/postfix/transport\" >> /etc/postfix/main.cf echo \"* smtp:smtp.fnal.gov\" >> /etc/postfix/transport postmap hash:/etc/postfix/transport postfix reload Test this setup by sending a message to yourself only. Bonus points for using an email address that goes to a site with aggressive SPAM filtering. Sending the announcement Use the osg-notify tool to send the announcement using the relevant options from the following table: Option Description --dry-run Use this option until you are ready to actually send the message --cert File that contains your OSG User Certificate --key File that contains your Private Key for your OSG User Certificate --no-sign Don't GPG sign the message (release only) --type production Not a test message --message File containing your message --subject The subject of your message --recipients List of recipient email addresses, must have at least one --oim-recipients Select contacts associated with resources and/or VOs --oim-contact-type Replacing with administrative for release announcements or security for security announcements --bypass-dns-check Use this option to skip the check that one of the host's IP addresses matches with the hostname resolution Security requirements Security announcements must be signed using the following options: --sign : GPG sign the message --sign-id : The ID of the key used for singing --from security : The mail comes from the OSG Security Team For release announcements use the following command: osg-notify --cert your-cert.pem --key your-key.pem \\ --no-sign --type production --message \\ --subject '' \\ --recipients \"osg-general@opensciencegrid.org osg-operations@opensciencegrid.org osg-sites@opensciencegrid.org vdt-discuss@opensciencegrid.org\" \\ --oim-recipients resources --oim-recipients vos --oim-contact-type administrative Replacing with an appropriate subject for your announcement and with the path to the file containing your message in plain text.","title":"Sending Announcements"},{"location":"services/sending-announcements/#sending-announcements","text":"Various OSG teams need to send out announcement about various events (releases, security advisories, planned changes, etc). This page describes how to send announcements using the osg-notify tool.","title":"Sending Announcements"},{"location":"services/sending-announcements/#prerequisites","text":"To send announcements, the following conditions must be met: A host with an IP address listed in the SPF Record A sufficiently modern Linux operating system. This procedure has been tested on a FermiCloud Scientific Linux 7 VM and a Linux Mint 18.3 laptop. It is known not to work on a FermiCloud Scientific Linux 6 VM. A valid OSG user certificate to lookup contacts in the topology database Local hostname matches DNS DNS forward and reverse lookups in place [tim@submit-1 topology]$ hostname submit-1.chtc.wisc.edu [tim@submit-1 topology]$ host submit-1.chtc.wisc.edu submit-1.chtc.wisc.edu has address 128.105.244.191 [tim@submit-1 topology]$ host 128 .105.244.191 191.244.105.128.in-addr.arpa domain name pointer submit-1.chtc.wisc.edu. (Required for security announcements) A GPG Key to sign the announcement","title":"Prerequisites"},{"location":"services/sending-announcements/#installation","text":"Install the required Yum repositories : Install the OSG tools: # yum install --enablerepo = devops topology-client If you are on a FermiCloud VM, update postfix to relay through FermiLab's official mail server: echo \"transport_maps = hash:/etc/postfix/transport\" >> /etc/postfix/main.cf echo \"* smtp:smtp.fnal.gov\" >> /etc/postfix/transport postmap hash:/etc/postfix/transport postfix reload Test this setup by sending a message to yourself only. Bonus points for using an email address that goes to a site with aggressive SPAM filtering.","title":"Installation"},{"location":"services/sending-announcements/#sending-the-announcement","text":"Use the osg-notify tool to send the announcement using the relevant options from the following table: Option Description --dry-run Use this option until you are ready to actually send the message --cert File that contains your OSG User Certificate --key File that contains your Private Key for your OSG User Certificate --no-sign Don't GPG sign the message (release only) --type production Not a test message --message File containing your message --subject The subject of your message --recipients List of recipient email addresses, must have at least one --oim-recipients Select contacts associated with resources and/or VOs --oim-contact-type Replacing with administrative for release announcements or security for security announcements --bypass-dns-check Use this option to skip the check that one of the host's IP addresses matches with the hostname resolution Security requirements Security announcements must be signed using the following options: --sign : GPG sign the message --sign-id : The ID of the key used for singing --from security : The mail comes from the OSG Security Team For release announcements use the following command: osg-notify --cert your-cert.pem --key your-key.pem \\ --no-sign --type production --message \\ --subject '' \\ --recipients \"osg-general@opensciencegrid.org osg-operations@opensciencegrid.org osg-sites@opensciencegrid.org vdt-discuss@opensciencegrid.org\" \\ --oim-recipients resources --oim-recipients vos --oim-contact-type administrative Replacing with an appropriate subject for your announcement and with the path to the file containing your message in plain text.","title":"Sending the announcement"},{"location":"services/topology-contacts-data/","text":"Topology and Contacts Data This is internal documentation intended for OSG Operations staff. It contains information about the data provided by https://topology.opensciencegrid.org . The topology data for the service is in https://github.com/opensciencegrid/topology , in the projects/ , topology/ , and virtual-organizations/ subdirectories. The contacts data is in https://bitbucket.org/opensciencegrid/contact/ , in contacts.yaml . Topology Data Admins may request changes to data in the topology repo via either a GitHub pull request or a Freshdesk ticket. These changes can be to a project, a VO, or a resource. The registration document and topology README document should tell them how to do that. In the case of a GitHub pull request, you will need to provide IDs using the bin/next_ids tool in an up-to-date local clone of Topology and potentially fix-up other data. To assist the user, do one of the following, depending on the severity of the fixes required for the PR: For minor issues, submit a \"Comment\" review using GitHub suggestions and ask the user to incorporate your suggestions . For major issues, create a branch based off of their PR, make changes, and submit your own PR that closes the original user's PR. The CI checks should catch most errors but you should still review the YAML changes. Certain things to check are: Do contact names and IDs match what's in the contacts data? (See below for instructions on how to get that information.) If the person is not in the contacts data, you will need to add them before approving the PR. Is the PR submitter authorized to make changes to that project/VO/resource? Can you match them to a person affiliated with that project/VO/site? (The contacts data now includes the GitHub usernames for some people. See below for instructions on how to get that information.) Is their GitHub ID registered in the contact database and are they associated with the relevant resource, site, facility, or VO? Retiring resources A resource can be disabled in its topology yaml file by setting Active: false . However the resource entry should not be immediately deleted from the yaml file. One reason for this is that the WLCG accounting info configured for resources is used to determine which resources to send APEL numbers for. Removing resources prematurely could prevent resummarized GRACC data from getting sent appropriately. Resources that have been inactive for at least two years are eligible to be deleted from the topology database. The GRACC records for this resource can be inspected in Kibana . In the search bar, enter ProbeName:*\\:FQDN in the search bar, where FQDN is the FQDN defined for your resource For example, if your resource FQDN is cmsgrid01.hep.wisc.edu you would enter ProbeName:*\\:cmsgrid01.hep.wisc.edu In the upper-right corner, use the Time Range selection to pick \"Last 2 years\" With this criteria selected, Kibana will show you if it has received any records for this resource in the past two years. If there are no records returned, you may remove the resource from the resource group yaml file in the topology repo. Any downtime entries for this resource in the corresponding downtime yaml file for the resource group must be removed also. If you remove the last resource in the resource group yaml file, you should remove the resource group and corresponding downtime yaml files as well. Reviewing project PRs New projects are typically created by the Research Facilitation team. Here are a few things to check: Did osg-bot warn about a \"New Organization\"? If so, search around in the projects directory and make sure the \"Organization\" in the YAML is not a typo or alternate spelling for an existing organization. grep around in the /projects/ directory for substrings of the organization. For example, if the new org is \"University of Wisconsin Madison\", do: $ grep -i wisconsin projects/*.yaml and you will see that it's supposed to be \"University of Wisconsin-Madison\". If the new organization is not a typo or alternate spelling, dismiss osg-bot's review with the comment \"new org is legit\". Is the project name is of the form _ , e.g. UWMadison_Parks ? (This is recommended but not required for new projects.) If so: Is the short name -> organization mapping for the institution in /mappings/project_institution.yaml (e.g. UWMadison: \"University of Wisconsin-Madison\" )? If not, ask the PR author to add it. Does the \"FieldOfScience\" in the YAML match one of the keys in /mappings/nsfscience.yaml ? (The list is also available on the left column of this CSV .) Is the \"Sponsor\" correct? The sponsor depends on where the users will be submitting jobs from: If they primarily submit from some CI Connect interface such as \"OSG Connect\", use: Sponsor : CampusGrid : Name : The campus grid name must be one of the ones in the /projects/_CAMPUS_GRIDS.yaml file.. Otherwise, the project must be sponsored by a VO: Sponsor : VirtualOrganization : Name : The VO name must be one of the ones in the /virtual-organizations/ dir. Contacts Data The OSG keeps contact data for administrators and maintainers of OSG resources and VOs for the purpose of distributing security, software, and adminstrative (e.g., OSG All-Hands dates) announcements. Additionally, OSG contacts have the following abilities: View other contacts' information (via HTML and XML ) with a registered certificate Register resource downtimes for resources that they are listed as an administrative contact, if they have a registered GitHub ID Contact data is kept as editable YAML in https://bitbucket.org/opensciencegrid/contact/ , in contacts.yaml . The YAML file contains sensitive information and is only visible to people with access to that repo. Getting access to the contact repo The contacts repo is hosted on BitBucket. You will need an Atlassian account for access to BitBucket. The account you use for OSG JIRA should work. Once you have an account, request access from Brian Lin, Mat Selmeci, or Derek Weitzel. You should then be able to go to https://bitbucket.org/opensciencegrid/contact/ . Using the contact repo BitBucket is similar to GitHub except you don't make a fork of the contact repo, you just clone it to your local machine. This means that any pushes go directly to the main repo instead of your own fork. Danger Don't push to master. For any changes, always create your own branch, push your changes to that branch, then make a pull request. Have someone else review and merge your pull request. All contact data is stored in contacts.yaml . The contact info is keyed by a 40-character hexadecimal ID which was generated from their email address when they were first added. An example entry is: 25357f62c7ab2ae11ddda1efd272bb5435dbfacb : # ^ this is their ID FullName : Example A. User Profile : This is an example user. GitHub : ExampleUser # ContactInformation data requires authorization to view ContactInformation : DNs : - ... IM : ... PrimaryEmail : user@example.net PrimaryPhone : ... When making changes to the contact data, first see if a contact is already in the YAML file. Search the YAML file for their name. Be sure to try variations of their name if you don't find them -- someone may be listed as \"Dave\" or \"David\", or have a middle name or middle initial. Follow the instructions below for adding or updating a contact, as appropriate. Adding a new contact Danger Any new contacts need to have their association with the OSG verified by a known contact within the relevant VO, site, or project. When registering a new contact, first obtain the required contact information . After obtaining this information and verifying their association with the OSG, fill out the values in template-contacts.yaml and add it to contacts.yaml . To get the hash used as the ID, run email-hash on their email address. For example: $ cd contact # this is your local clone of the \"contact\" repo $ bin/email-hash user@example.net 25357f62c7ab2ae11ddda1efd272bb5435dbfacb Then your new entry will look like 25357f62c7ab2ae11ddda1efd272bb5435dbfacb : FullName : Example A. User .... The FullName and Profile fields in the main section, and the PrimaryEmail field in the ContactInformation section are required. The PrimaryEmail field in the ContactInformation section should match the hash that you used for the ID. In addition, if they will be making pull requests against the topology repo, e.g. for updating site information, reporting downtime, or updating project or VO information, obtain their GitHub username and put it in the GitHub field. Editing a contact Once you have found a contact in the YAML file, edit the attributes by hand. If you want to add information that is not present for that contact, look at template-contacts.yaml to find out what the attributes are called. Note The ID of the contact never changes, even if the user's PrimaryEmail changes. Important If you change the contact's FullName , you must make the same change to every place that the contact is mentioned in the topology repo. Get the contact changes merged in first.","title":"Topology and Contacts Data"},{"location":"services/topology-contacts-data/#topology-and-contacts-data","text":"This is internal documentation intended for OSG Operations staff. It contains information about the data provided by https://topology.opensciencegrid.org . The topology data for the service is in https://github.com/opensciencegrid/topology , in the projects/ , topology/ , and virtual-organizations/ subdirectories. The contacts data is in https://bitbucket.org/opensciencegrid/contact/ , in contacts.yaml .","title":"Topology and Contacts Data"},{"location":"services/topology-contacts-data/#topology-data","text":"Admins may request changes to data in the topology repo via either a GitHub pull request or a Freshdesk ticket. These changes can be to a project, a VO, or a resource. The registration document and topology README document should tell them how to do that. In the case of a GitHub pull request, you will need to provide IDs using the bin/next_ids tool in an up-to-date local clone of Topology and potentially fix-up other data. To assist the user, do one of the following, depending on the severity of the fixes required for the PR: For minor issues, submit a \"Comment\" review using GitHub suggestions and ask the user to incorporate your suggestions . For major issues, create a branch based off of their PR, make changes, and submit your own PR that closes the original user's PR. The CI checks should catch most errors but you should still review the YAML changes. Certain things to check are: Do contact names and IDs match what's in the contacts data? (See below for instructions on how to get that information.) If the person is not in the contacts data, you will need to add them before approving the PR. Is the PR submitter authorized to make changes to that project/VO/resource? Can you match them to a person affiliated with that project/VO/site? (The contacts data now includes the GitHub usernames for some people. See below for instructions on how to get that information.) Is their GitHub ID registered in the contact database and are they associated with the relevant resource, site, facility, or VO?","title":"Topology Data"},{"location":"services/topology-contacts-data/#retiring-resources","text":"A resource can be disabled in its topology yaml file by setting Active: false . However the resource entry should not be immediately deleted from the yaml file. One reason for this is that the WLCG accounting info configured for resources is used to determine which resources to send APEL numbers for. Removing resources prematurely could prevent resummarized GRACC data from getting sent appropriately. Resources that have been inactive for at least two years are eligible to be deleted from the topology database. The GRACC records for this resource can be inspected in Kibana . In the search bar, enter ProbeName:*\\:FQDN in the search bar, where FQDN is the FQDN defined for your resource For example, if your resource FQDN is cmsgrid01.hep.wisc.edu you would enter ProbeName:*\\:cmsgrid01.hep.wisc.edu In the upper-right corner, use the Time Range selection to pick \"Last 2 years\" With this criteria selected, Kibana will show you if it has received any records for this resource in the past two years. If there are no records returned, you may remove the resource from the resource group yaml file in the topology repo. Any downtime entries for this resource in the corresponding downtime yaml file for the resource group must be removed also. If you remove the last resource in the resource group yaml file, you should remove the resource group and corresponding downtime yaml files as well.","title":"Retiring resources"},{"location":"services/topology-contacts-data/#reviewing-project-prs","text":"New projects are typically created by the Research Facilitation team. Here are a few things to check: Did osg-bot warn about a \"New Organization\"? If so, search around in the projects directory and make sure the \"Organization\" in the YAML is not a typo or alternate spelling for an existing organization. grep around in the /projects/ directory for substrings of the organization. For example, if the new org is \"University of Wisconsin Madison\", do: $ grep -i wisconsin projects/*.yaml and you will see that it's supposed to be \"University of Wisconsin-Madison\". If the new organization is not a typo or alternate spelling, dismiss osg-bot's review with the comment \"new org is legit\". Is the project name is of the form _ , e.g. UWMadison_Parks ? (This is recommended but not required for new projects.) If so: Is the short name -> organization mapping for the institution in /mappings/project_institution.yaml (e.g. UWMadison: \"University of Wisconsin-Madison\" )? If not, ask the PR author to add it. Does the \"FieldOfScience\" in the YAML match one of the keys in /mappings/nsfscience.yaml ? (The list is also available on the left column of this CSV .) Is the \"Sponsor\" correct? The sponsor depends on where the users will be submitting jobs from: If they primarily submit from some CI Connect interface such as \"OSG Connect\", use: Sponsor : CampusGrid : Name : The campus grid name must be one of the ones in the /projects/_CAMPUS_GRIDS.yaml file.. Otherwise, the project must be sponsored by a VO: Sponsor : VirtualOrganization : Name : The VO name must be one of the ones in the /virtual-organizations/ dir.","title":"Reviewing project PRs"},{"location":"services/topology-contacts-data/#contacts-data","text":"The OSG keeps contact data for administrators and maintainers of OSG resources and VOs for the purpose of distributing security, software, and adminstrative (e.g., OSG All-Hands dates) announcements. Additionally, OSG contacts have the following abilities: View other contacts' information (via HTML and XML ) with a registered certificate Register resource downtimes for resources that they are listed as an administrative contact, if they have a registered GitHub ID Contact data is kept as editable YAML in https://bitbucket.org/opensciencegrid/contact/ , in contacts.yaml . The YAML file contains sensitive information and is only visible to people with access to that repo.","title":"Contacts Data"},{"location":"services/topology-contacts-data/#getting-access-to-the-contact-repo","text":"The contacts repo is hosted on BitBucket. You will need an Atlassian account for access to BitBucket. The account you use for OSG JIRA should work. Once you have an account, request access from Brian Lin, Mat Selmeci, or Derek Weitzel. You should then be able to go to https://bitbucket.org/opensciencegrid/contact/ .","title":"Getting access to the contact repo"},{"location":"services/topology-contacts-data/#using-the-contact-repo","text":"BitBucket is similar to GitHub except you don't make a fork of the contact repo, you just clone it to your local machine. This means that any pushes go directly to the main repo instead of your own fork. Danger Don't push to master. For any changes, always create your own branch, push your changes to that branch, then make a pull request. Have someone else review and merge your pull request. All contact data is stored in contacts.yaml . The contact info is keyed by a 40-character hexadecimal ID which was generated from their email address when they were first added. An example entry is: 25357f62c7ab2ae11ddda1efd272bb5435dbfacb : # ^ this is their ID FullName : Example A. User Profile : This is an example user. GitHub : ExampleUser # ContactInformation data requires authorization to view ContactInformation : DNs : - ... IM : ... PrimaryEmail : user@example.net PrimaryPhone : ... When making changes to the contact data, first see if a contact is already in the YAML file. Search the YAML file for their name. Be sure to try variations of their name if you don't find them -- someone may be listed as \"Dave\" or \"David\", or have a middle name or middle initial. Follow the instructions below for adding or updating a contact, as appropriate.","title":"Using the contact repo"},{"location":"services/topology-contacts-data/#adding-a-new-contact","text":"Danger Any new contacts need to have their association with the OSG verified by a known contact within the relevant VO, site, or project. When registering a new contact, first obtain the required contact information . After obtaining this information and verifying their association with the OSG, fill out the values in template-contacts.yaml and add it to contacts.yaml . To get the hash used as the ID, run email-hash on their email address. For example: $ cd contact # this is your local clone of the \"contact\" repo $ bin/email-hash user@example.net 25357f62c7ab2ae11ddda1efd272bb5435dbfacb Then your new entry will look like 25357f62c7ab2ae11ddda1efd272bb5435dbfacb : FullName : Example A. User .... The FullName and Profile fields in the main section, and the PrimaryEmail field in the ContactInformation section are required. The PrimaryEmail field in the ContactInformation section should match the hash that you used for the ID. In addition, if they will be making pull requests against the topology repo, e.g. for updating site information, reporting downtime, or updating project or VO information, obtain their GitHub username and put it in the GitHub field.","title":"Adding a new contact"},{"location":"services/topology-contacts-data/#editing-a-contact","text":"Once you have found a contact in the YAML file, edit the attributes by hand. If you want to add information that is not present for that contact, look at template-contacts.yaml to find out what the attributes are called. Note The ID of the contact never changes, even if the user's PrimaryEmail changes. Important If you change the contact's FullName , you must make the same change to every place that the contact is mentioned in the topology repo. Get the contact changes merged in first.","title":"Editing a contact"},{"location":"services/topology/","text":"Topology Service This document contains information about the service that runs: https://topology.opensciencegrid.org https://topology-itb.opensciencegrid.org https://map.opensciencegrid.org : Generates the topology map used on OSG Display The source code for the service is in https://github.com/opensciencegrid/topology , in the src/ subdirectory. This repository also contains the public part of the data that gets served. Deployment Topology is a webapp run with Apache on the host topology.opensciencegrid.org . The ITB instance runs on the host topology-itb.opensciencegrid.org . The hosts are VMs at Nebraska; for SSH access, contact Derek Weitzel or Brian Bockelman. Installation These instructions assume an EL 7 host with the EPEL repositories available. The software will be installed into /opt/topology . A second instance for the webhook app will be installed into /opt/topology-webhook . (The ITB instance should be installed into /opt/topology-itb and /opt/topology-itb-webhook instead.) The following steps should be done as root. Install prerequisites: # yum install python36 gridsite httpd mod_ssl Clone the repository: For the production topology host: # git clone https://github.com/opensciencegrid/topology /opt/topology # git clone https://github.com/opensciencegrid/topology /opt/topology-webhook For the topology-itb host: # git clone https://github.com/opensciencegrid/topology /opt/topology-itb # git clone https://github.com/opensciencegrid/topology /opt/topology-itb-webhook Set up the virtualenv in the clone -- from /opt/topology or /opt/topology-itb : # python36 -m venv venv # . ./venv/bin/activate # pip install -r requirements-apache.txt Repeat for the webhook instance -- from /opt/topology-webhook or /opt/topology-itb-webhook . File system locations The following files/directories must exist and have the proper permissions: Location Purpose Ownership Mode /opt/topology Production software install root:root 0755 /opt/topology-itb ITB software install root:root 0755 /opt/topology-webhook Production webhook software install root:root 0755 /opt/topology-itb-webhook ITB webhook software install root:root 0755 /etc/opt/topology/config-production.py Production config root:root 0644 /etc/opt/topology/config-itb.py ITB config root:root 0644 /etc/opt/topology/bitbucket Private key for contact info repo apache:root 0600 /etc/opt/topology/bitbucket.pub Public key for contact info repo apache:root 0644 /etc/opt/topology/github Private key for pushing automerge commits topomerge:root 0600 /etc/opt/topology/github.pub Public key for pushing automerge commits topomerge:root 0644 /etc/opt/topology/github_webhook_secret GitHub webhook secret for validating webhooks topomerge:root 0600 ~apache/.ssh SSH dir for Apache apache:root 0700 ~apache/.ssh/known_hosts Known hosts file for Apache apache:root 0644 ~topomerge Home dir for topomerge Apache user topomerge:root 0755 ~topomerge/.ssh SSH dir for topomerge Apache user topomerge:root 0700 ~topomerge/.ssh/known_hosts Known hosts file for topomerge Apache user topomerge:root 0644 /var/cache/topology Checkouts of topology and contacts data for production instance apache:apache 0755 /var/cache/topology-itb Checkouts of topology and contacts data for ITB instance apache:apache 0755 /var/cache/topology-webhook Topology repo and state info for production webhook instance topomerge:topomerge 0755 /var/cache/topology-itb-webhook Topology repo and state info for ITB webhook instance topomerge:topomerge 0755 ~apache/.ssh/known_hosts must contain an entry for bitbucket.org ; use ssh-keyscan bitbucket.org to get the appropriate entry. ~topomerge/.ssh/known_hosts must contain an entry for github.com ; use ssh-keyscan github.com to get the appropriate entry. Software configuration Configuration for the main app is under /etc/opt/topology/ , in config-production.py and config-itb.py . The webhook app configuration is in config-production-webhook.py and config-itb-webhook.py . The files are in Python format and override default settings in src/webapp/default_config.py in the topology repo. HTTPD configuration is in /etc/httpd ; we use the modules mod_ssl , mod_gridsite , and mod_wsgi . The first two are installed via yum; the .so file for mod_wsgi is located in /opt/topology/venv/lib/python3.6/site-packages/mod_wsgi/server/ or /opt/topology-itb/venv/lib/python3.6/site-packages/mod_wsgi/server/ for the ITB instance. Each of the hostnames are VHosts in the apache configuration. Some special notes: https://map.opensciencegrid.org runs in the same wsgi process as the production topology, but the URL is limited to only the map code. Further, it does not use mod_gridsite so that users are not asked to present a client certificate. VHosts are configured: ServerName topology.opensciencegrid.org ServerAlias my.opensciencegrid.org myosg.opensciencegrid.org Data configuration Configuration is in /etc/opt/topology/config-production.py and config-itb.py ; and config-production-webhook.py and config-itb-webhook.py . Variable Purpose TOPOLOGY_DATA_DIR The directory containing a clone of the topology repository for data use TOPOLOGY_DATA_REPO The remote tracking repository of TOPOLOGY_DATA_DIR TOPOLOGY_DATA_BRANCH The remote tracking branch of TOPOLOGY_DATA_DIR WEBHOOK_DATA_DIR The directory containing a mirror-clone of the topology repository for webhook use WEBHOOK_DATA_REPO The remote tracking repository of WEBHOOK_DATA_DIR WEBHOOK_DATA_BRANCH The remote tracking branch of WEBHOOK_DATA_DIR WEBHOOK_STATE_DIR Directory containing webhook state information between pull request and status hooks WEBHOOK_SECRET_KEY Secret key configured on GitHub for webhook delivery CONTACT_DATA_DIR The directory containing a clone of the contact repository for data use CONTACT_DATA_REPO The remote tracking repository of CONTACT_DATA_DIR (default: \"git@bitbucket.org:opensciencegrid/contact.git\" ) CONTACT_DATA_BRANCH The remote tracking branch of CONTACT_DATA_BRANCH (default: \"master\" ) CACHE_LIFETIME Frequency of automatic data updates in seconds (default: 900 ) GIT_SSH_KEY Location of ssh public key file for git access. /etc/opt/topology/bitbucket.pub for the main app, and /etc/opt/topology/github.pub for the webhook app Puppet ensures that the production contact and topology clones are up to date with their configured remote tracking repo and branch. Puppet does not manage the ITB data directories so they need to be updated by hand during testing. GitHub Configuration for Webhook App Go to the https://github.com/opensciencegrid/topology/settings/hooks page on GitHub. There are four webhooks to set up; pull_request and status for both the topology and topology-itb hosts. Payload URL Content type Events to trigger webhook https://topology.opensciencegrid.org/webhook/status application/json Statuses https://topology.opensciencegrid.org/webhook/pull_request application/json Pull requests https://topology-itb.opensciencegrid.org/webhook/status application/json Statuses https://topology-itb.opensciencegrid.org/webhook/pull_request application/json Pull requests For each webhook, \"Secret\" should be a random 40 digit hex string, which should match the contents of the file /etc/opt/topology/github_webhook_secret (the path configured in WEBHOOK_SECRET_KEY ). The OSG's dedicated GitHub user for automating pushes is currently osg-bot . This user needs to have write access to the topology repo on GitHub. The ssh public key in /etc/opt/topology/github.pub should be registered with the osg-bot GitHub user. This can be done by logging into GitHub as osg-bot , and adding the new ssh key under the settings page. Required System Packages Currently the webhook app uses the mailx command to send email. If not already installed, install it with: :::console # yum install mailx Testing changes on the ITB instance All changes should be tested on the ITB instance before deploying to production. If you can, test them on your local machine first. These instructions assume that the code has not been merged to master. Update the ITB software installation at /opt/topology-itb and note the current branch: # cd /opt/topology-itb # git fetch --all # git status Check out the branch you are testing. If the target remote is not configured, add it : # git checkout -b / Verify that you are using the intended data associated with the code you are testing: If the data format has changed in an incompatible way, modify /etc/opt/topology/config-itb.py : Backup the ITB configuration file: # cd /etc/opt/topology # cp -p config-itb.py { ,.bak } Change the TOPOLOGY_DATA_DIR and/or CONTACT_DATA_DIR lines to point to a new directories so the previous data does not get overwritten with incompatible data. If you need to use a different branch for the data, switch to it: Check the branch of TOPOLOGY_DATA_DIR from /etc/opt/topology/config-itb.py # cd # git fetch --all # git status Note the previous branch, you will need this later If the target remote is not configured, add it Check out the target branch: # git checkout -b / Pull any upstream changes to ensure that your branch is up to date: # git pull For updates to the webhook app, follow the above instructions for the ITB webhook instance under /opt/topology-itb-webhook and its corresponding config file, /etc/opt/topology/config-itb-webhook.py . Restart httpd : # systemctl restart httpd Test the web interface at https://topology-itb.opensciencegrid.org . Errors and output are in /var/log/httpd/error_log . Reverting changes Switch /opt/topology-itb to the previous branch: # cd /opt/topology-itb # git checkout For updates to the webhook app, switch /opt/topology-itb-webhook to the previous master: # cd /opt/topology-itb-webhook # git checkout If you made config changes to /etc/opt/topology/config-itb.py or config-itb-webhook.py , restore the backup. If you checked out a different branch for data, revert it back to the old branch. Restart httpd : # systemctl restart httpd Test the web interface at https://topology-itb.opensciencegrid.org . Updating the production instance Updating the production instance is similar to updating ITB instance. Update master on the Git clone at /opt/topology : # cd /opt/topology # git pull origin master For updates to the webhook app, update master on the Git clone at /opt/topology-webhook : # cd /opt/topology-webhook # git pull origin master Make config changes to /etc/opt/topology/config-production.py and/or config-production-webhook.py if necessary. Restart httpd : # systemctl restart httpd Test the web interface at https://topology.opensciencegrid.org . Errors and output are in /var/log/httpd/error_log . Reverting changes Switch /opt/topology to the previous master: # cd /opt/topology # ## (use `git reflog` to find the previous commit that was used) # git reset --hard For updates to the webhook app, switch /opt/topology-webhook to the previous master: # cd /opt/topology-webhook ### (use `git reflog` to find the previous commit that was used) # git reset --hard If you made config changes to /etc/opt/topology/config-production.py or config-production-webhook.py , revert them. Restart httpd : # systemctl restart httpd Test the web interface at https://topology.opensciencegrid.org .","title":"Topology Service"},{"location":"services/topology/#topology-service","text":"This document contains information about the service that runs: https://topology.opensciencegrid.org https://topology-itb.opensciencegrid.org https://map.opensciencegrid.org : Generates the topology map used on OSG Display The source code for the service is in https://github.com/opensciencegrid/topology , in the src/ subdirectory. This repository also contains the public part of the data that gets served.","title":"Topology Service"},{"location":"services/topology/#deployment","text":"Topology is a webapp run with Apache on the host topology.opensciencegrid.org . The ITB instance runs on the host topology-itb.opensciencegrid.org . The hosts are VMs at Nebraska; for SSH access, contact Derek Weitzel or Brian Bockelman.","title":"Deployment"},{"location":"services/topology/#installation","text":"These instructions assume an EL 7 host with the EPEL repositories available. The software will be installed into /opt/topology . A second instance for the webhook app will be installed into /opt/topology-webhook . (The ITB instance should be installed into /opt/topology-itb and /opt/topology-itb-webhook instead.) The following steps should be done as root. Install prerequisites: # yum install python36 gridsite httpd mod_ssl Clone the repository: For the production topology host: # git clone https://github.com/opensciencegrid/topology /opt/topology # git clone https://github.com/opensciencegrid/topology /opt/topology-webhook For the topology-itb host: # git clone https://github.com/opensciencegrid/topology /opt/topology-itb # git clone https://github.com/opensciencegrid/topology /opt/topology-itb-webhook Set up the virtualenv in the clone -- from /opt/topology or /opt/topology-itb : # python36 -m venv venv # . ./venv/bin/activate # pip install -r requirements-apache.txt Repeat for the webhook instance -- from /opt/topology-webhook or /opt/topology-itb-webhook .","title":"Installation"},{"location":"services/topology/#file-system-locations","text":"The following files/directories must exist and have the proper permissions: Location Purpose Ownership Mode /opt/topology Production software install root:root 0755 /opt/topology-itb ITB software install root:root 0755 /opt/topology-webhook Production webhook software install root:root 0755 /opt/topology-itb-webhook ITB webhook software install root:root 0755 /etc/opt/topology/config-production.py Production config root:root 0644 /etc/opt/topology/config-itb.py ITB config root:root 0644 /etc/opt/topology/bitbucket Private key for contact info repo apache:root 0600 /etc/opt/topology/bitbucket.pub Public key for contact info repo apache:root 0644 /etc/opt/topology/github Private key for pushing automerge commits topomerge:root 0600 /etc/opt/topology/github.pub Public key for pushing automerge commits topomerge:root 0644 /etc/opt/topology/github_webhook_secret GitHub webhook secret for validating webhooks topomerge:root 0600 ~apache/.ssh SSH dir for Apache apache:root 0700 ~apache/.ssh/known_hosts Known hosts file for Apache apache:root 0644 ~topomerge Home dir for topomerge Apache user topomerge:root 0755 ~topomerge/.ssh SSH dir for topomerge Apache user topomerge:root 0700 ~topomerge/.ssh/known_hosts Known hosts file for topomerge Apache user topomerge:root 0644 /var/cache/topology Checkouts of topology and contacts data for production instance apache:apache 0755 /var/cache/topology-itb Checkouts of topology and contacts data for ITB instance apache:apache 0755 /var/cache/topology-webhook Topology repo and state info for production webhook instance topomerge:topomerge 0755 /var/cache/topology-itb-webhook Topology repo and state info for ITB webhook instance topomerge:topomerge 0755 ~apache/.ssh/known_hosts must contain an entry for bitbucket.org ; use ssh-keyscan bitbucket.org to get the appropriate entry. ~topomerge/.ssh/known_hosts must contain an entry for github.com ; use ssh-keyscan github.com to get the appropriate entry.","title":"File system locations"},{"location":"services/topology/#software-configuration","text":"Configuration for the main app is under /etc/opt/topology/ , in config-production.py and config-itb.py . The webhook app configuration is in config-production-webhook.py and config-itb-webhook.py . The files are in Python format and override default settings in src/webapp/default_config.py in the topology repo. HTTPD configuration is in /etc/httpd ; we use the modules mod_ssl , mod_gridsite , and mod_wsgi . The first two are installed via yum; the .so file for mod_wsgi is located in /opt/topology/venv/lib/python3.6/site-packages/mod_wsgi/server/ or /opt/topology-itb/venv/lib/python3.6/site-packages/mod_wsgi/server/ for the ITB instance. Each of the hostnames are VHosts in the apache configuration. Some special notes: https://map.opensciencegrid.org runs in the same wsgi process as the production topology, but the URL is limited to only the map code. Further, it does not use mod_gridsite so that users are not asked to present a client certificate. VHosts are configured: ServerName topology.opensciencegrid.org ServerAlias my.opensciencegrid.org myosg.opensciencegrid.org","title":"Software configuration"},{"location":"services/topology/#data-configuration","text":"Configuration is in /etc/opt/topology/config-production.py and config-itb.py ; and config-production-webhook.py and config-itb-webhook.py . Variable Purpose TOPOLOGY_DATA_DIR The directory containing a clone of the topology repository for data use TOPOLOGY_DATA_REPO The remote tracking repository of TOPOLOGY_DATA_DIR TOPOLOGY_DATA_BRANCH The remote tracking branch of TOPOLOGY_DATA_DIR WEBHOOK_DATA_DIR The directory containing a mirror-clone of the topology repository for webhook use WEBHOOK_DATA_REPO The remote tracking repository of WEBHOOK_DATA_DIR WEBHOOK_DATA_BRANCH The remote tracking branch of WEBHOOK_DATA_DIR WEBHOOK_STATE_DIR Directory containing webhook state information between pull request and status hooks WEBHOOK_SECRET_KEY Secret key configured on GitHub for webhook delivery CONTACT_DATA_DIR The directory containing a clone of the contact repository for data use CONTACT_DATA_REPO The remote tracking repository of CONTACT_DATA_DIR (default: \"git@bitbucket.org:opensciencegrid/contact.git\" ) CONTACT_DATA_BRANCH The remote tracking branch of CONTACT_DATA_BRANCH (default: \"master\" ) CACHE_LIFETIME Frequency of automatic data updates in seconds (default: 900 ) GIT_SSH_KEY Location of ssh public key file for git access. /etc/opt/topology/bitbucket.pub for the main app, and /etc/opt/topology/github.pub for the webhook app Puppet ensures that the production contact and topology clones are up to date with their configured remote tracking repo and branch. Puppet does not manage the ITB data directories so they need to be updated by hand during testing.","title":"Data configuration"},{"location":"services/topology/#github-configuration-for-webhook-app","text":"Go to the https://github.com/opensciencegrid/topology/settings/hooks page on GitHub. There are four webhooks to set up; pull_request and status for both the topology and topology-itb hosts. Payload URL Content type Events to trigger webhook https://topology.opensciencegrid.org/webhook/status application/json Statuses https://topology.opensciencegrid.org/webhook/pull_request application/json Pull requests https://topology-itb.opensciencegrid.org/webhook/status application/json Statuses https://topology-itb.opensciencegrid.org/webhook/pull_request application/json Pull requests For each webhook, \"Secret\" should be a random 40 digit hex string, which should match the contents of the file /etc/opt/topology/github_webhook_secret (the path configured in WEBHOOK_SECRET_KEY ). The OSG's dedicated GitHub user for automating pushes is currently osg-bot . This user needs to have write access to the topology repo on GitHub. The ssh public key in /etc/opt/topology/github.pub should be registered with the osg-bot GitHub user. This can be done by logging into GitHub as osg-bot , and adding the new ssh key under the settings page.","title":"GitHub Configuration for Webhook App"},{"location":"services/topology/#required-system-packages","text":"Currently the webhook app uses the mailx command to send email. If not already installed, install it with: :::console # yum install mailx","title":"Required System Packages"},{"location":"services/topology/#testing-changes-on-the-itb-instance","text":"All changes should be tested on the ITB instance before deploying to production. If you can, test them on your local machine first. These instructions assume that the code has not been merged to master. Update the ITB software installation at /opt/topology-itb and note the current branch: # cd /opt/topology-itb # git fetch --all # git status Check out the branch you are testing. If the target remote is not configured, add it : # git checkout -b / Verify that you are using the intended data associated with the code you are testing: If the data format has changed in an incompatible way, modify /etc/opt/topology/config-itb.py : Backup the ITB configuration file: # cd /etc/opt/topology # cp -p config-itb.py { ,.bak } Change the TOPOLOGY_DATA_DIR and/or CONTACT_DATA_DIR lines to point to a new directories so the previous data does not get overwritten with incompatible data. If you need to use a different branch for the data, switch to it: Check the branch of TOPOLOGY_DATA_DIR from /etc/opt/topology/config-itb.py # cd # git fetch --all # git status Note the previous branch, you will need this later If the target remote is not configured, add it Check out the target branch: # git checkout -b / Pull any upstream changes to ensure that your branch is up to date: # git pull For updates to the webhook app, follow the above instructions for the ITB webhook instance under /opt/topology-itb-webhook and its corresponding config file, /etc/opt/topology/config-itb-webhook.py . Restart httpd : # systemctl restart httpd Test the web interface at https://topology-itb.opensciencegrid.org . Errors and output are in /var/log/httpd/error_log .","title":"Testing changes on the ITB instance"},{"location":"services/topology/#reverting-changes","text":"Switch /opt/topology-itb to the previous branch: # cd /opt/topology-itb # git checkout For updates to the webhook app, switch /opt/topology-itb-webhook to the previous master: # cd /opt/topology-itb-webhook # git checkout If you made config changes to /etc/opt/topology/config-itb.py or config-itb-webhook.py , restore the backup. If you checked out a different branch for data, revert it back to the old branch. Restart httpd : # systemctl restart httpd Test the web interface at https://topology-itb.opensciencegrid.org .","title":"Reverting changes"},{"location":"services/topology/#updating-the-production-instance","text":"Updating the production instance is similar to updating ITB instance. Update master on the Git clone at /opt/topology : # cd /opt/topology # git pull origin master For updates to the webhook app, update master on the Git clone at /opt/topology-webhook : # cd /opt/topology-webhook # git pull origin master Make config changes to /etc/opt/topology/config-production.py and/or config-production-webhook.py if necessary. Restart httpd : # systemctl restart httpd Test the web interface at https://topology.opensciencegrid.org . Errors and output are in /var/log/httpd/error_log .","title":"Updating the production instance"},{"location":"services/topology/#reverting-changes_1","text":"Switch /opt/topology to the previous master: # cd /opt/topology # ## (use `git reflog` to find the previous commit that was used) # git reset --hard For updates to the webhook app, switch /opt/topology-webhook to the previous master: # cd /opt/topology-webhook ### (use `git reflog` to find the previous commit that was used) # git reset --hard If you made config changes to /etc/opt/topology/config-production.py or config-production-webhook.py , revert them. Restart httpd : # systemctl restart httpd Test the web interface at https://topology.opensciencegrid.org .","title":"Reverting changes"},{"location":"troubleshooting/repository-scripts/","text":"Troubleshooting Guide for Yum Repository Scripts The repo.opensciencegrid.org and repo-itb.opensciencegrid.org hosts contain the OSG Yum software repositories plus related services and tools. In particular, the mash software is used to download RPMs from where they are built (at the University of Wisconsin\u2013Madison), and there are some associated scripts to configure and invoke mash periodically. Use this guide to monitor the mash system for problems and to perform basic troubleshooting when such problems arise. Monitoring To monitor the repository hosts for proper mash operation, do the following steps on each host: ssh to repo.opensciencegrid.org and cd into /var/log/repo to view logs from mash updates Examine the \u201cLast modified\u201d timestamp of all of the update_repo.*.log files If the timestamps are all less than 2 hours old, life is good and you can skip the remaining steps below Otherwise, examine the \u201cLast modified\u201d timestamp of the update_all_repos.err file If the update_all_repos.err timestamp is current, there may be a mash process that is hung; see the Troubleshooting steps below If all timestamps are more than 6 hours old, something may be wrong with cron or its mash entries: Verify that cron is running and that the cron entries for mash are still present; if not, try to restore things Otherwise, create a Freshdesk ticket with a subject like \u201cRepo update logs are too old on \u201d and with relevant details in the body Assign the ticket to the \u201cSoftware\u201d group Troubleshooting and Mitigation Identifying and fixing a hung mash process If a mash update process hangs, all future invocations from cron of the mash scripts will exit without taking action because of the hung process. Thus, it is important to identify and remove any hung processes so that future updates can proceed. Use the procedure below to remove any hung mash processes; doing so is safe in that it will not adversely affect the Yum repositories being served from the host. In the listing of log files (see above), view the file =update_all_repos.err= In the error log file, look for messages such as: Wed Jan 20 18:10:02 UTC 2016: **Can't acquire lock, is update_all_repos.sh already running?** This message indicates that the most recent update attempt quit early due to the presence of a lock file, most likely from a hung mash process. Look for mash processes: root@host # ps -C mash -o pid,ppid,pgid,start,command PID PPID PGID STARTED COMMAND 24551 24549 23455 Jan 15 /usr/bin/python /usr/bin/mash osg-3.1-el5-release -o 24552 24551 23455 Jan 15 /usr/bin/python /usr/bin/mash osg-3.1-el5-release -o If there are mash processes that started on a previous date or more than 2 hours ago, it is best to remove their corresponding process groups (PGID above): root@host # kill -TERM -23455 Then verify that the old processes are gone using the same ps command as above: root@host # ps -C mash -o pid,ppid,pgid,start,command PID PPID PGID STARTED COMMAND If any part of this process does not look or work as expected: Create a Freshdesk ticket with a subject like \u201cRepo update logs are too old on \u201d and with relevant details in the body Assign the ticket to the \u201cSoftware\u201d group","title":"Troubleshooting Guide for Yum Repository Scripts"},{"location":"troubleshooting/repository-scripts/#troubleshooting-guide-for-yum-repository-scripts","text":"The repo.opensciencegrid.org and repo-itb.opensciencegrid.org hosts contain the OSG Yum software repositories plus related services and tools. In particular, the mash software is used to download RPMs from where they are built (at the University of Wisconsin\u2013Madison), and there are some associated scripts to configure and invoke mash periodically. Use this guide to monitor the mash system for problems and to perform basic troubleshooting when such problems arise.","title":"Troubleshooting Guide for Yum Repository Scripts"},{"location":"troubleshooting/repository-scripts/#monitoring","text":"To monitor the repository hosts for proper mash operation, do the following steps on each host: ssh to repo.opensciencegrid.org and cd into /var/log/repo to view logs from mash updates Examine the \u201cLast modified\u201d timestamp of all of the update_repo.*.log files If the timestamps are all less than 2 hours old, life is good and you can skip the remaining steps below Otherwise, examine the \u201cLast modified\u201d timestamp of the update_all_repos.err file If the update_all_repos.err timestamp is current, there may be a mash process that is hung; see the Troubleshooting steps below If all timestamps are more than 6 hours old, something may be wrong with cron or its mash entries: Verify that cron is running and that the cron entries for mash are still present; if not, try to restore things Otherwise, create a Freshdesk ticket with a subject like \u201cRepo update logs are too old on \u201d and with relevant details in the body Assign the ticket to the \u201cSoftware\u201d group","title":"Monitoring"},{"location":"troubleshooting/repository-scripts/#troubleshooting-and-mitigation","text":"","title":"Troubleshooting and Mitigation"},{"location":"troubleshooting/repository-scripts/#identifying-and-fixing-a-hung-mash-process","text":"If a mash update process hangs, all future invocations from cron of the mash scripts will exit without taking action because of the hung process. Thus, it is important to identify and remove any hung processes so that future updates can proceed. Use the procedure below to remove any hung mash processes; doing so is safe in that it will not adversely affect the Yum repositories being served from the host. In the listing of log files (see above), view the file =update_all_repos.err= In the error log file, look for messages such as: Wed Jan 20 18:10:02 UTC 2016: **Can't acquire lock, is update_all_repos.sh already running?** This message indicates that the most recent update attempt quit early due to the presence of a lock file, most likely from a hung mash process. Look for mash processes: root@host # ps -C mash -o pid,ppid,pgid,start,command PID PPID PGID STARTED COMMAND 24551 24549 23455 Jan 15 /usr/bin/python /usr/bin/mash osg-3.1-el5-release -o 24552 24551 23455 Jan 15 /usr/bin/python /usr/bin/mash osg-3.1-el5-release -o If there are mash processes that started on a previous date or more than 2 hours ago, it is best to remove their corresponding process groups (PGID above): root@host # kill -TERM -23455 Then verify that the old processes are gone using the same ps command as above: root@host # ps -C mash -o pid,ppid,pgid,start,command PID PPID PGID STARTED COMMAND If any part of this process does not look or work as expected: Create a Freshdesk ticket with a subject like \u201cRepo update logs are too old on \u201d and with relevant details in the body Assign the ticket to the \u201cSoftware\u201d group","title":"Identifying and fixing a hung mash process"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"OSG Operations Welcome to the home page of the OSG Operations Team documentation area! Mission The mission of OSG Operations is to maintain and improve distributed high throughput computing services to support research communities. This is accomplished by: Operating and maintaining our services in a user-oriented, robust, and reliable manner. Developing a professional and skilled staff dedicated to a service philosophy. Managing resources responsibly, efficiently, and with accountability. Evaluating and continually improving the actions, methods and processes that allow the OSG to operate. Contact Us Open a Ticket Slack channel - if you can't create an account, send an e-mail to help@opensciencegrid.org Email: help@opensciencegrid.org Registration (Contact, Resource, VO, or Project) Register with OSG Weekly Operations Meetings When: Fridays 12:30 pm Central URL: https://unl.zoom.us/j/183382852 Phone: +1 669 900 6833 or +1 408 638 0968 or +1 646 876 9923 Meeting ID: 183 382 852 (password required; available on request) Meeting Minutes February 23, 2024 February 16, 2024 February 9, 2024 February 2, 2024 January 26, 2024 January 19, 2024 January 12, 2024 January 5, 2024 December 29, 2023 (canceled) December 22, 2023 (canceled) December 15, 2023 December 8, 2023 December 1, 2023 November 24, 2023 (canceled) November 17, 2023 November 10, 2023 November 3, 2023 October 27, 2023 October 20, 2023 October 13, 2023 October 6, 2023 September 29, 2023 September 22, 2023 September 15, 2023 September 8, 2023 September 1, 2023 August 25, 2023 August 18, 2023 August 11, 2023 August 4, 2023 July 28, 2023 July 21, 2023 January 14, 2023 (canceled due to Throughput Computing 23) July 7, 2023 June 30, 2023 June 23, 2023 June 16, 2023 June 9, 2023 June 2, 2023 May 26, 2023 May 19, 2023 May 12, 2023 May 5, 2023 April 28, 2023 April 21, 2023 April 14, 2023 April 7, 2023 March 31, 2023 March 24, 2023 March 17, 2023 March 10, 2023 March 3, 2023 February 24, 2023 February 17, 2023 February 10, 2023 February 3, 2023 January 27, 2023 January 20, 2023 January 13, 2023 January 6, 2023 (canceled) December 30, 2022 (canceled) December 23, 2022 (canceled) December 16, 2022 December 9, 2022 December 2, 2022 November 25, 2022 (canceled) November 18, 2022 November 11, 2022 November 4, 2022 October 28, 2022 October 21, 2022 October 14, 2022 October 7, 2022 September 30, 2022 September 23, 2022 (canceled) September 16, 2022 (canceled) September 9, 2022 September 2, 2022 August 26, 2022 August 19, 2022 August 12, 2022 August 5, 2022 (canceled) July 29, 2022 July 22, 2022 (canceled) July 15, 2022 July 8, 2022 July 1, 2022 June 24, 2022 June 17, 2022 June 10, 2022 June 3, 2022 May 27, 2022 May 20, 2022 May 13, 2022 May 6, 2022 (canceled) April 29, 2022 April 22, 2022 April 15, 2022 April 8, 2022 April 1, 2022 March 25, 2022 March 18, 2022 (canceled) March 11, 2022 March 4, 2022 February 25, 2022 February 18, 2022 February 11, 2022 February 4, 2022 January 28, 2022 January 21, 2022 January 14, 2022 January 7, 2022 December 31, 2021 (canceled) December 24, 2021 (canceled) December 17, 2021 December 10, 2021 December 3, 2021 November 26, 2021 (canceled) November 19, 2021 November 12, 2021 November 5, 2021 October 29, 2021 October 22, 2021 October 15, 2021 (canceled) October 8, 2021 October 1, 2021 September 24, 2021 September 17, 2021 September 10, 2021 September 3, 2021 August 27, 2021 August 20, 2021 August 13, 2021 August 6, 2021 July 30, 2021 July 23, 2021 July 16, 2021 July 9, 2021 July 2, 2021 June 25, 2021 June 18, 2021 June 11, 2021 June 4, 2021 May 28, 2021 May 21, 2021 May 14, 2021 May 7, 2021 April 30, 2021 April 23, 2021 April 16, 2021 April 9, 2021 April 2, 2021 March 26, 2021 March 19, 2021 March 12, 2021 March 5, 2021 (canceled) February 26, 2021 February 19, 2021 February 12, 2021 February 5, 2021 January 29, 2021 January 22, 2021 January 15, 2021 January 8, 2021 January 1, 2021 (canceled) December 25, 2020 (canceled) December 18, 2020 December 11, 2020 December 4, 2020 November 20, 2020 November 13, 2020 November 6, 2020 October 30, 2020 October 23, 2020 October 16, 2020 October 9, 2020 October 2, 2020 September 25, 2020 September 18, 2020 September 11, 2020 September 4, 2020 (canceled) August 28, 2020 August 21, 2020 August 14, 2020 August 7, 2020 July 31, 2020 July 24, 2020 July 17, 2020 July 10, 2020 July 3, 2020 (canceled) June 26, 2020 June 19, 2020 June 12, 2020 June 5, 2020 May 29, 2020 (canceled) May 22, 2020 May 15, 2020 May 8, 2020 May 1, 2020 April 24, 2020 April 17, 2020 April 10, 2020 April 3, 2020 March 27, 2020 March 20, 2020 March 13, 2020 March 6, 2020 February 28, 2020 February 21, 2020 February 14, 2020 February 7, 2020 January 31, 2020 January 24, 2020 January 17, 2020 January 10, 2020 January 3, 2020 December 27, 2019 December 20, 2019 December 13, 2019 December 6, 2019 November 29, 2019 (canceled) November 22, 2019 November 15, 2019 November 8, 2019 November 1, 2019 October 25, 2019 October 18, 2019 October 11, 2019 October 4, 2019 September 27, 2019 September 20, 2019 September 13, 2019 September 6, 2019 August 30, 2019 August 23, 2019 August 16, 2019 August 9, 2019 August 2, 2019 July 26, 2019 July 19, 2019 July 12, 2019 July 8, 2019 July 1, 2019 June 24, 2019 June 17, 2019 June 10, 2019 June 3, 2019 May 28, 2019 May 20, 2019 May 13, 2019 May 6, 2019 April 29, 2019 April 22, 2019 April 15, 2019 April 8, 2019 April 1, 2019 March 25, 2019 March 18, 2019 (canceled due to HOW 2019) March 11, 2019 March 4, 2019 February 25, 2019 February 19, 2019 February 11, 2019 February 4, 2019 January 28, 2019 (canceled due to F2F meeting) January 22, 2019 January 14, 2019 January 7, 2019 December 31, 2018 (canceled) December 24, 2018 (canceled) December 17, 2018 December 10, 2018 December 3, 2018 November 26, 2018 November 19, 2018 November 13, 2018 November 5, 2018 (canceled) October 29, 2018 (canceled) October 22, 2018 (canceled) October 15, 2018 October 8, 2018 October 1, 2018 September 24, 2018 September 17, 2018 September 10, 2018 September 4, 2018 August 27, 2018 August 20, 2018 August 13, 2018 August 6, 2018 Archived Meeting Minutes For archived meeting minutes, see the GitHub repository","title":"Home"},{"location":"#osg-operations","text":"Welcome to the home page of the OSG Operations Team documentation area!","title":"OSG Operations"},{"location":"#mission","text":"The mission of OSG Operations is to maintain and improve distributed high throughput computing services to support research communities. This is accomplished by: Operating and maintaining our services in a user-oriented, robust, and reliable manner. Developing a professional and skilled staff dedicated to a service philosophy. Managing resources responsibly, efficiently, and with accountability. Evaluating and continually improving the actions, methods and processes that allow the OSG to operate.","title":"Mission"},{"location":"#contact-us","text":"Open a Ticket Slack channel - if you can't create an account, send an e-mail to help@opensciencegrid.org Email: help@opensciencegrid.org","title":"Contact Us"},{"location":"#registration-contact-resource-vo-or-project","text":"Register with OSG","title":"Registration (Contact, Resource, VO, or Project)"},{"location":"#weekly-operations-meetings","text":"When: Fridays 12:30 pm Central URL: https://unl.zoom.us/j/183382852 Phone: +1 669 900 6833 or +1 408 638 0968 or +1 646 876 9923 Meeting ID: 183 382 852 (password required; available on request)","title":"Weekly Operations Meetings"},{"location":"#meeting-minutes","text":"February 23, 2024 February 16, 2024 February 9, 2024 February 2, 2024 January 26, 2024 January 19, 2024 January 12, 2024 January 5, 2024 December 29, 2023 (canceled) December 22, 2023 (canceled) December 15, 2023 December 8, 2023 December 1, 2023 November 24, 2023 (canceled) November 17, 2023 November 10, 2023 November 3, 2023 October 27, 2023 October 20, 2023 October 13, 2023 October 6, 2023 September 29, 2023 September 22, 2023 September 15, 2023 September 8, 2023 September 1, 2023 August 25, 2023 August 18, 2023 August 11, 2023 August 4, 2023 July 28, 2023 July 21, 2023 January 14, 2023 (canceled due to Throughput Computing 23) July 7, 2023 June 30, 2023 June 23, 2023 June 16, 2023 June 9, 2023 June 2, 2023 May 26, 2023 May 19, 2023 May 12, 2023 May 5, 2023 April 28, 2023 April 21, 2023 April 14, 2023 April 7, 2023 March 31, 2023 March 24, 2023 March 17, 2023 March 10, 2023 March 3, 2023 February 24, 2023 February 17, 2023 February 10, 2023 February 3, 2023 January 27, 2023 January 20, 2023 January 13, 2023 January 6, 2023 (canceled) December 30, 2022 (canceled) December 23, 2022 (canceled) December 16, 2022 December 9, 2022 December 2, 2022 November 25, 2022 (canceled) November 18, 2022 November 11, 2022 November 4, 2022 October 28, 2022 October 21, 2022 October 14, 2022 October 7, 2022 September 30, 2022 September 23, 2022 (canceled) September 16, 2022 (canceled) September 9, 2022 September 2, 2022 August 26, 2022 August 19, 2022 August 12, 2022 August 5, 2022 (canceled) July 29, 2022 July 22, 2022 (canceled) July 15, 2022 July 8, 2022 July 1, 2022 June 24, 2022 June 17, 2022 June 10, 2022 June 3, 2022 May 27, 2022 May 20, 2022 May 13, 2022 May 6, 2022 (canceled) April 29, 2022 April 22, 2022 April 15, 2022 April 8, 2022 April 1, 2022 March 25, 2022 March 18, 2022 (canceled) March 11, 2022 March 4, 2022 February 25, 2022 February 18, 2022 February 11, 2022 February 4, 2022 January 28, 2022 January 21, 2022 January 14, 2022 January 7, 2022 December 31, 2021 (canceled) December 24, 2021 (canceled) December 17, 2021 December 10, 2021 December 3, 2021 November 26, 2021 (canceled) November 19, 2021 November 12, 2021 November 5, 2021 October 29, 2021 October 22, 2021 October 15, 2021 (canceled) October 8, 2021 October 1, 2021 September 24, 2021 September 17, 2021 September 10, 2021 September 3, 2021 August 27, 2021 August 20, 2021 August 13, 2021 August 6, 2021 July 30, 2021 July 23, 2021 July 16, 2021 July 9, 2021 July 2, 2021 June 25, 2021 June 18, 2021 June 11, 2021 June 4, 2021 May 28, 2021 May 21, 2021 May 14, 2021 May 7, 2021 April 30, 2021 April 23, 2021 April 16, 2021 April 9, 2021 April 2, 2021 March 26, 2021 March 19, 2021 March 12, 2021 March 5, 2021 (canceled) February 26, 2021 February 19, 2021 February 12, 2021 February 5, 2021 January 29, 2021 January 22, 2021 January 15, 2021 January 8, 2021 January 1, 2021 (canceled) December 25, 2020 (canceled) December 18, 2020 December 11, 2020 December 4, 2020 November 20, 2020 November 13, 2020 November 6, 2020 October 30, 2020 October 23, 2020 October 16, 2020 October 9, 2020 October 2, 2020 September 25, 2020 September 18, 2020 September 11, 2020 September 4, 2020 (canceled) August 28, 2020 August 21, 2020 August 14, 2020 August 7, 2020 July 31, 2020 July 24, 2020 July 17, 2020 July 10, 2020 July 3, 2020 (canceled) June 26, 2020 June 19, 2020 June 12, 2020 June 5, 2020 May 29, 2020 (canceled) May 22, 2020 May 15, 2020 May 8, 2020 May 1, 2020 April 24, 2020 April 17, 2020 April 10, 2020 April 3, 2020 March 27, 2020 March 20, 2020 March 13, 2020 March 6, 2020 February 28, 2020 February 21, 2020 February 14, 2020 February 7, 2020 January 31, 2020 January 24, 2020 January 17, 2020 January 10, 2020 January 3, 2020 December 27, 2019 December 20, 2019 December 13, 2019 December 6, 2019 November 29, 2019 (canceled) November 22, 2019 November 15, 2019 November 8, 2019 November 1, 2019 October 25, 2019 October 18, 2019 October 11, 2019 October 4, 2019 September 27, 2019 September 20, 2019 September 13, 2019 September 6, 2019 August 30, 2019 August 23, 2019 August 16, 2019 August 9, 2019 August 2, 2019 July 26, 2019 July 19, 2019 July 12, 2019 July 8, 2019 July 1, 2019 June 24, 2019 June 17, 2019 June 10, 2019 June 3, 2019 May 28, 2019 May 20, 2019 May 13, 2019 May 6, 2019 April 29, 2019 April 22, 2019 April 15, 2019 April 8, 2019 April 1, 2019 March 25, 2019 March 18, 2019 (canceled due to HOW 2019) March 11, 2019 March 4, 2019 February 25, 2019 February 19, 2019 February 11, 2019 February 4, 2019 January 28, 2019 (canceled due to F2F meeting) January 22, 2019 January 14, 2019 January 7, 2019 December 31, 2018 (canceled) December 24, 2018 (canceled) December 17, 2018 December 10, 2018 December 3, 2018 November 26, 2018 November 19, 2018 November 13, 2018 November 5, 2018 (canceled) October 29, 2018 (canceled) October 22, 2018 (canceled) October 15, 2018 October 8, 2018 October 1, 2018 September 24, 2018 September 17, 2018 September 10, 2018 September 4, 2018 August 27, 2018 August 20, 2018 August 13, 2018 August 6, 2018","title":"Meeting Minutes"},{"location":"#archived-meeting-minutes","text":"For archived meeting minutes, see the GitHub repository","title":"Archived Meeting Minutes"},{"location":"external-oasis-repos/","text":"External OASIS Repositories We offer hosting of non-OSG CVMFS repositories on OASIS. This means that requests to create, rename, remove, or blanking OASIS repositories will come in as GOC tickets. This document contains instructions for handling those tickets. Also see Policy for OSG Mirroring of External CVMFS repositories External OASIS repository Requests to Host a Repository on OASIS Ensure that the repository administrator is valid for the VO. This can be done by (a) OSG already having a relationship with the person or (b) the contacting the VO manager to find out. Also, the person should be listed in the OSG topology contacts list . Review provided URL and verify that it is appropriate for the VO and no other project uses it already. In order to make sure the name in URL is appropriate, check that the name is derived from the VO name or one of its projects. Then, add the repository URL to the topology for given VO under the OASISRepoURLs . This should cause the repository's configuration to be added to the OSG Stratum-0 within 15 minutes after URL is added into the topology. For example, if new URL is for the VO DUNE http://hcc-cvmfs-repo.unl.edu:8000/cvmfs/dune.osgstorage.org edit the following under the OASIS section and create PR: git clone git://github.com/opensciencegrid/topology.git vim topology/virtual-organizations/DUNE.yaml ... OASIS: OASISRepoURLs: - http://hcc-cvmfs-repo.unl.edu:8000/cvmfs/dune.osgstorage.org/ ... When the PR is approved, check on the oasis.opensciencegrid.org host whether the new repository was successfuly signed. There should be message about it in the log file /var/log/oasis/generate_whitelists.log : Tue Sep 25 17:34:02 2018 Running add_osg_repository http://hcc-cvmfs-repo.unl.edu:8000/cvmfs/dune.osgstorage.org dune.osgstorage.org: Signing 7 day whitelist with masterkeycard... done If the respository ends in a new domain name that has not been distributed before, a new domain key will be needed on oasis-replica which should get automatically downloaded from the etc/cvmfs/keys directory in the master branch of the config-repo github repository . There should be a message about downloading it in the log file /var/log/cvmfs/generate_replicas.log . After the key is downloaded the repository should also be automatically added, with messages in the same log file. After the repository is successfully on oasis-replica, in addition you need to update the OSG configuration repository. Make changes in a workspace cloned from the config-repo github repository and use the osg branch (or a branch made from it) in a personal account on oasis-itb . Add a domain configuration in etc/cvmfs/domain.d that's a lot like one of the other imported domains, for example egi.eu.conf . The server urls might be slightly different; use the URLs of the stratum 1s where it is already hosted if there are any, and you can add at least the FNAL and BNL stratum 1s. Copy key(s) for the domain into etc/cvmfs/keys from the master branch, either a single .pub file or a directory, whichever the master branch has. Test all these changes out on the config-osg.opensciencegrid.org repository on oasis-itb using the copy_config_osg command, and configure a test client to read from oasis-itb.opensciencegrid.org instead of oasis.opensciencegrid.org . Then commit those changes into a new branch you made from the osg branch, and make a pull request. Once that PR is approved and merged, log in to the oasis machine and run copy_config_osg as root there to copy from github to the production configuration repository on the oasis machine. If the repository name does not match *.opensciencegrid.org or *.osgstorage.org , skip this step and go on to your next step. If it does match one of those two patterns, then respond to the ticket to tell the administrator to continue with their next step (their step 4). We don't want them to continue before 15 minutes has elapsed after step 2 above, so either wait that much time or tell them the time they may proceed (15 minutes after you updated topology). Then wait until the admin has updated the ticket to indicate that they have completed their step before moving on. Ask the administrator of the BNL stratum 1 (John De Stefano) to also add the new repository. The BNL Stratum-1 administrator should set the service to read from http://oasis-replica.opensciencegrid.org:8002/cvmfs/ . When the BNL Stratum-1 administrator has reported back that the replication is ready, respond to the requester that the repository is fully replicated on the OSG and close the ticket. Requests to Change the URL of an External Repository If there is a request to change the URL of an external repository, update the registered value in OASISRepoURLs for the respective VO in the topology. Tell the requester that it is ready 15 minutes after topology is updated. Requests to Remove an External Repository After validating that the ticket submitter is authorized by the VO's OASIS manager, delete the registered value for in topology for the VO in OASIS Repo URLs. Verify that it is removed by running the following on any oasis machine to make sure it is missing from the list: print_osg_repos|grep Check if the repository has been replicated to RAL by looking in their repositories.json . The user documentation requests the user to make a GGUS ticket to do this, so either ask them to do it or do it yourself. Add the BNL Stratum-1 operator (John De Stefano) to the ticket and ask him to remove the repository. Wait for him to finish before proceeding. Add the FNAL Stratum-1 operators (Merina Albert, Hyun Woo Kim) to the ticket and ask them when they can be ready to delete the repository. They can't remove it before it is removed from oasis-replica because their Stratum-1 automatically adds all repositories oasis-replica has. However, it has to be done within 8 hours of removal on oasis-replica or an alarm will start going off. Run the following command on oasis , oasis-itb , oasis-replica and oasis-replica-itb : remove_osg_repository -f Tell the FNAL Stratum-1 operators to go ahead and remove the repository. Response to Security Incident on an External Repository If there is a security incident on the publishing machine of an external repository and a publishing key is compromised, the fingerprint of that key should be added to /cvmfs/config-osg.opensciencegrid.org/etc/cvmfs/blacklist . In addition, another line should be added in the form . When the BNL Stratum-1 administrator has reported back that the replication is ready, respond to the requester that the repository is fully replicated on the OSG and close the ticket.","title":"Requests to Host a Repository on OASIS"},{"location":"external-oasis-repos/#requests-to-change-the-url-of-an-external-repository","text":"If there is a request to change the URL of an external repository, update the registered value in OASISRepoURLs for the respective VO in the topology. Tell the requester that it is ready 15 minutes after topology is updated.","title":"Requests to Change the URL of an External Repository"},{"location":"external-oasis-repos/#requests-to-remove-an-external-repository","text":"After validating that the ticket submitter is authorized by the VO's OASIS manager, delete the registered value for in topology for the VO in OASIS Repo URLs. Verify that it is removed by running the following on any oasis machine to make sure it is missing from the list: print_osg_repos|grep Check if the repository has been replicated to RAL by looking in their repositories.json . The user documentation requests the user to make a GGUS ticket to do this, so either ask them to do it or do it yourself. Add the BNL Stratum-1 operator (John De Stefano) to the ticket and ask him to remove the repository. Wait for him to finish before proceeding. Add the FNAL Stratum-1 operators (Merina Albert, Hyun Woo Kim) to the ticket and ask them when they can be ready to delete the repository. They can't remove it before it is removed from oasis-replica because their Stratum-1 automatically adds all repositories oasis-replica has. However, it has to be done within 8 hours of removal on oasis-replica or an alarm will start going off. Run the following command on oasis , oasis-itb , oasis-replica and oasis-replica-itb : remove_osg_repository -f Tell the FNAL Stratum-1 operators to go ahead and remove the repository.","title":"Requests to Remove an External Repository"},{"location":"external-oasis-repos/#response-to-security-incident-on-an-external-repository","text":"If there is a security incident on the publishing machine of an external repository and a publishing key is compromised, the fingerprint of that key should be added to /cvmfs/config-osg.opensciencegrid.org/etc/cvmfs/blacklist . In addition, another line should be added in the form ReportableVOName: Corrected VOName: CSV File A CSV file can be specified in order to specify multiple corrections in a single batch update. The CSV file must be of a certain format. No Header Row The number of columns must be at least the number of matching attributes and the corrected attribute. For example, a CSV file for VO corrections would be of format: ,,,.... The CSV file can be specified on the command line with the option --csv , for example: ./gracc-correct vo add --csv ","title":"GRACC Corrections"},{"location":"services/gracc-corrections/#installing-gracc-corrections","text":"GRACC Corrections are used to modify records during the summarization process. RAW records are not modified in the correction process. The correction is applied after summarization and aggregation, but before the record is enriched with data from Topology . The correction is step 3 in the GRACC summary record workflow: Raw record is received. The raw record is never modified Summarizer aggregates the raw records Corrections are applied Summarized records are enriched by Topology Summarized and enriched records are uploaded to GRACC We can currently correct: VO Names Project Names OIM_Site (using the Host_description field)","title":"Installing GRACC Corrections"},{"location":"services/gracc-corrections/#limitations","text":"Additional corrections can be written, but some attributes are used to detect duplicate records, and are therefore protected from corrections. Protected records for summarization are: EndTime, RawVOName, RawProjectName, DN, Processors, ResourceType, CommonName, Host_description, Resource_ExitCode, Grid, ReportableVOName, ProbeName For example, we could not write a correction for the Host_description . If we had a correction that changed Host_description , then the duplicate detection would not detect the same record during resummarization and it would have duplicate summarized records.","title":"Limitations"},{"location":"services/gracc-corrections/#command-line","text":"The gracc-correct tool is used to create, update, and delete corrections. The tool must be run from a host that can write to GRACC, which is very restricted. It is recommended to run the gracc-correct tool directly from the gracc.opensciencegrid.org host. The gracc-correct tool is able to parse new corrections either individually from user input or many at once from a CSV file.","title":"Command Line"},{"location":"services/gracc-corrections/#user-input","text":"Each correction attempts to match one or more attributes of the summarized record in order to set another attribute. For example, for the VO correction: $ gracc-correct vo add Field ( s ) to correct: VOName: ReportableVOName: Corrected VOName: ","title":"User Input"},{"location":"services/gracc-corrections/#csv-file","text":"A CSV file can be specified in order to specify multiple corrections in a single batch update. The CSV file must be of a certain format. No Header Row The number of columns must be at least the number of matching attributes and the corrected attribute. For example, a CSV file for VO corrections would be of format: ,,,.... The CSV file can be specified on the command line with the option --csv , for example: ./gracc-correct vo add --csv ","title":"CSV File"},{"location":"services/hosted-ce-definitions/","text":"OSG Hosted CE Definitions The OSG provides a Hosted CE service. In general, this document lists what an instance of that service can and cannot do. Hosted CEs in General Benefits The site continues to operate its own batch system according to local considerations; OSG operates the interface between OSG and the site, aka the Hosted CE; To the site, OSG simply looks like a set of user accounts; and OSG uses the accounts to provision site resources for various science user communities, and hence the site has complete control over resource allocation via local policies on the accounts. Prerequisites In general, the site must operate a working batch system that is accessible via at least one head node; OSG works with HTCondor, Slurm, PBS Pro/Torque, LSF, and Grid Engine. Site operations include hardware and software maintenance, defining and implementing usage policies, monitoring, troubleshooting, etc. These are the same activities to support local users. In addition, the site: Must communicate with OSG their intent to share resources \u2014 in most cases, a meeting between site and OSG staff should be sufficient to discuss goals, plans, etc.; Must meet the technical requirements on the OSG website , summarized below: The site is willing to add OSG user accounts with inbound SSH access and submit privileges, A mechanism exists for transferring files between the head nodes and worker nodes, and Worker nodes must have outbound Internet access and temporary storage space for jobs. Is strongly encouraged to tell OSG about preferred constraints on resource requests (e.g., per-job limits on CPUs, memory, and storage; overall limits on number of running and idle jobs; submission rates), so that OSG can tailor such requests to better fit the site. Standard Hosted CE A Standard Hosted CE is the default case in which the interaction between OSG and the site is relatively simple and easy to maintain. Most sites fall into this category. Benefits Configuration is limited to basics, so there is less upfront and ongoing work for OSG and the site; OSG maintains and shares mappings from user groups to OSG user accounts on the site, so that the site can \u2014 if desired \u2014 limit resource allocations to certain groups; and OSG maintains the required OSG configuration on the site\u2019s head node and worker nodes (if the site provides a distribution mechanism to worker nodes, such as a shared file system). Site Responsibilities In addition to the general prerequisites above, the following apply to a Standard Hosted CE: The site must create and maintain 20 OSG user accounts on a single head node; note that: OSG will access their accounts via SSH using one RSA key for all 20 accounts; and All 20 OSG accounts must be able to submit to the local batch system. The site may control the resources allocated to different OSG user groups by writing and maintaining policies on the OSG user accounts within the batch system. The site provides privilege separation among the OSG user groups via the OSG user accounts and standard Unix privilege separation.","title":"Hosted CE Definitions"},{"location":"services/hosted-ce-definitions/#osg-hosted-ce-definitions","text":"The OSG provides a Hosted CE service. In general, this document lists what an instance of that service can and cannot do.","title":"OSG Hosted CE Definitions"},{"location":"services/hosted-ce-definitions/#hosted-ces-in-general","text":"","title":"Hosted CEs in General"},{"location":"services/hosted-ce-definitions/#benefits","text":"The site continues to operate its own batch system according to local considerations; OSG operates the interface between OSG and the site, aka the Hosted CE; To the site, OSG simply looks like a set of user accounts; and OSG uses the accounts to provision site resources for various science user communities, and hence the site has complete control over resource allocation via local policies on the accounts.","title":"Benefits"},{"location":"services/hosted-ce-definitions/#prerequisites","text":"In general, the site must operate a working batch system that is accessible via at least one head node; OSG works with HTCondor, Slurm, PBS Pro/Torque, LSF, and Grid Engine. Site operations include hardware and software maintenance, defining and implementing usage policies, monitoring, troubleshooting, etc. These are the same activities to support local users. In addition, the site: Must communicate with OSG their intent to share resources \u2014 in most cases, a meeting between site and OSG staff should be sufficient to discuss goals, plans, etc.; Must meet the technical requirements on the OSG website , summarized below: The site is willing to add OSG user accounts with inbound SSH access and submit privileges, A mechanism exists for transferring files between the head nodes and worker nodes, and Worker nodes must have outbound Internet access and temporary storage space for jobs. Is strongly encouraged to tell OSG about preferred constraints on resource requests (e.g., per-job limits on CPUs, memory, and storage; overall limits on number of running and idle jobs; submission rates), so that OSG can tailor such requests to better fit the site.","title":"Prerequisites"},{"location":"services/hosted-ce-definitions/#standard-hosted-ce","text":"A Standard Hosted CE is the default case in which the interaction between OSG and the site is relatively simple and easy to maintain. Most sites fall into this category.","title":"Standard Hosted CE"},{"location":"services/hosted-ce-definitions/#benefits_1","text":"Configuration is limited to basics, so there is less upfront and ongoing work for OSG and the site; OSG maintains and shares mappings from user groups to OSG user accounts on the site, so that the site can \u2014 if desired \u2014 limit resource allocations to certain groups; and OSG maintains the required OSG configuration on the site\u2019s head node and worker nodes (if the site provides a distribution mechanism to worker nodes, such as a shared file system).","title":"Benefits"},{"location":"services/hosted-ce-definitions/#site-responsibilities","text":"In addition to the general prerequisites above, the following apply to a Standard Hosted CE: The site must create and maintain 20 OSG user accounts on a single head node; note that: OSG will access their accounts via SSH using one RSA key for all 20 accounts; and All 20 OSG accounts must be able to submit to the local batch system. The site may control the resources allocated to different OSG user groups by writing and maintaining policies on the OSG user accounts within the batch system. The site provides privilege separation among the OSG user groups via the OSG user accounts and standard Unix privilege separation.","title":"Site Responsibilities"},{"location":"services/install-gwms-factory/","text":"GlideinWMS Factory Installation This document describes how to install a Glidein Workflow Managment System (GlideinWMS) Factory instance. This document assumes expertise with HTCondor and familiarity with the GlideinWMS software. It does not cover anything but the simplest possible install. Please consult the GlideinWMS reference documentation for advanced topics, including non-root, non-RPM-based installation. In this document the terms glidein and pilot (job) will be used interchangeably. This parts covers these primary components of the GlideinWMS system: WMS Collector / Schedd : A set of condor_collector and condor_schedd processes that allow the submission of pilots to Grid entries. GlideinWMS Factory : The process submitting the pilots when needed Warning We really recommend you to use the OSG provided Factory and not to install your own . A VO Frontend is sufficient to submit your jobs and to decide scheduling policies. And this will avoid for you the complexity to deal directly with grid/cloud sites. If you really need you own Factory be aware that it is a complex component and may require a non trivial maintenance effort. Before Starting Before starting the installation process, consider the following points (consulting the Reference section below as needed): Requirements Host and OS A host to install the GlideinWMS Factory (pristine node). Currently most of our testing has been done on Scientific Linux 6 and 7. Root access The GlideinWMS Factory has the following requirements: CPU : 4-8 cores for a large installation (1 should suffice on a small install) RAM : 4-8GB on a large installation (1GB should suffice for small installs) Disk : 10GB will be plenty sufficient for all the binaries, config and log files related to GlideinWMS. If you are a large site with need to keep significant history and logs, you may want to allocate 100GB+ to store long histories. Users The GlideinWMS Factory installation will create the following users unless they are already created . User Default uid Comment condor none HTCondor user (installed via dependencies). gfactory none This user runs the GlideinWMS VO factory. To verify that the user gfactory has gfactory as primary group check the output of root@host # getent passwd gfactory | cut -d: -f4 | xargs getent group It should be the gfactory group. Certificates Certificate User that owns certificate Path to certificate Host certificate root /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem Here are instructions to request a host certificate. The host certificate/key is used for authorization, however, authorization between the Factory and the GlideinWMS collector is done by file system authentication. Networking Firewalls It must be on the public internet, with at least one port open to the world; all worker nodes will load data from this node trough HTTP. Note that worker nodes will also need outbound access in order to access this HTTP port. Installation Procedure As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates Installing HTCondor Most required software is installed from the Factory RPM installation. HTCondor is the only exception since there are many different ways to install it , using the RPM system or not. You need to have HTCondor installed before installing the GlideinWMS Factory. If yum cannot find a HTCondor RPM, it will install the dummy empty-condor RPM, assuming that you installed HTCondor using a tarball distribution. If you don't have HTCondor already installed, you can install the HTCondor RPM from the OSG repository: root@host # yum install condor.x86_64 Installing HTCondor-BOSCO If you plan to send jobs using direct batch submission (aka BOSCO), then you need also the condor-bosco package. You'll have to install the package and remove one of its files /etc/condor/config.d/60-campus_factory.config because it interferes with the Factory configuration. root@host # yum install condor-bosco root@host # rm /etc/condor/config.d/60-campus_factory.config root@host # touch /etc/condor/config.d/60-campus_factory.config Install GWMS Factory Download and install the Factory RPM Install the RPM and dependencies (be prepared for a lot of dependencies). root@host # yum install glideinwms-factory This will install the current production release verified and tested by OSG with default HTCondor configuration. This command will install the GlideinWMS Factory, HTCondor, the OSG client, and all the required dependencies. If you wish to install a different version of GlideinWMS, add the \"--enablerepo\" argument to the command as follows: yum install --enablerepo=osg-testing glideinwms-factory : The most recent production release, still in testing phase. This will usually match the current tarball version on the GlideinWMS home page . (The osg-release production version may lag behind the tarball release by a few weeks as it is verified and packaged by OSG). Note that this will also take the osg-testing versions of all dependencies as well. yum install --enablerepo=osg-upcoming glideinwms-factory : The most recent development series release, ie version 3.3.x release. This has newer features such as cloud submission support, but is less tested. Download HTCondor tarballs You will need to download HTCondor tarballs for each architecture that you want to deploy pilots on . At this point, GlideinWMS factory does not support pulling HTCondor binaries from your system area. Suggested is that you put these binaries in /var/lib/gwms-factory/condor but any gfactory accessible location should suffice. Configuration Procedure After installing the RPM you need to configure the components of the GlideinWMS Factory: Edit Factory configuration options Edit HTCondor configuration options Create a HTCondor grid map file Reconfigure and Start Factory Configuring the Factory The configuration file is /etc/gwms-factory/glideinWMS.xml . The next steps will describe each line that you will need to edit for most cases, but you may want to review the whole file to be sure that it is configured correctly. Security configuration In the security section, you will need to provide each Frontend that is allowed to communicate with the Factory: security key_length=\"2048\" pub_key=\"RSA\" remove_old_cred_age=\"30\" remove_old_cred_freq=\"24\" reuse_oldkey_onstartup_gracetime=\"900\"> These attributes are very important to get exactly right or the Frontend will not be trusted. This should match one of the factory and security sections of the Frontend configuration Configuring the GlideinWMS Frontend in the following way: Note This is a snippet from the Frontend configuration (for reference), not the Factory that you are configuring now! For the factory section: # from frontend.xml .... For the security: # from frontend.xml Note that the identity of the Frontend must match what HTCondor authenticates the DN of the frontend to. In /etc/condor/certs/condor_mapfile , there must be an entry with vofrontend_service definition (in this case): GSI \"^\\/DC\\=org\\/DC\\=doegrids\\/OU\\=Services\\/CN\\=Some\\ Name\\ 834323%ENDCOLOR%$\" % GREEN % vofrontend_service % ENDCOLOR % Entry configuration Entries are grid/cloud endpoints (aka Compute Elements, or gatekeepers) that can accept job requests and run pilots (which will run user jobs). Each entry needs to be configured to communicate to a specific gatekeeper. An example test entry is provided in the default GlideinWMS configuration file. At the very least, you will need to modify the entry line: You will need to modify the entry name and gatekeeper . This will determine the gatekeeper that you access. Specific gatekeepers often require specific \"rsl\" attributes that determine the job queue that you are submitting to, or other attributes. Add them in the rsl attribute. Also, be sure to distribute your entries across the various HTCondor schedd work managers to balance load. To see the available schedd use condor_status -schedd -l | grep Name . Several schedd options are configured by default for you: schedd_glideins2, schedd_glideins3, schedd_glideins4, schedd_glideins5 , as well as the default schedd . This can be modified in the HTCondor configuration. Add any specific options, such as limitations on jobs/pilots or glexec/voms requirements in the entry section below the above line. More details are in the GlideinWMS Factory configuration guide . !!! warning If there is no match between auth_metod and trust_domain of the entry and the type and trust_domain listed in one of the credentials of one of the Frontends using this Factory, then no job can run on that entry. The Factory must advertise the correct Resource Name of each entry for accounting purposes. Then the Factory must also advertise in the entry all the attributes that will allow to match the query expression used in the Frontends connecting to this Factory (e.g. as explained in the VO frontend configuration document ). Note Keep an eye on this part as we're dealing with singularity. Then you must advertise correctly if the site supports gLExec . If it does not set GLEXEC_BIN to NONE , if gLExec is installed via OSG set it to OSG , otherwise set it to the path of gLExec. For example this snippet advertises GLIDEIN_Supported_VOs attribute with the supported VO so that can be used with the query above in the VO frontend and says that the resource does not support gLExec: ... ... Note Specially if jobs are sent to OSG resources, it is very important to set the GLIDEIN_Resource_Name and to be consistent with the Resource Name reported in OIM because that name will be used for job accounting in Gratia. It should be the name of the Resource in OIM or the name of the Resource Group (specially if there are many gatekeepers submitting to the same cluster). More information on options can be found here Configuring Tarballs Each pilot will download HTCondor binaries from the staging area. Often, multiple binaries are needed to support various architectures and platforms. Currently, you will need to provide at least one tarball for GlideinWMS to use. (Using the system binaries is currently not supported). Download a HTCondor tarball from here . Suggested is to put the binaries in /var/lib/gwms-factory/condor , but any factory-accessible location will do just fine. Once you have downloaded the tarball, configure it in /etc/gwms-factory/glideinWMS.xml like in the following: Remember also to modify the condor_os and condor_arch attributes in the entries (the configured Compute Elements) to pick the correct HTCondor binary. Here are more details on using multiple HTCondor binaries. Note that is sufficient to set the base_dir ; the reconfigure command will prepare the tarball and add it to the XML config file. Configuring HTCondor The HTCondor configuration for the Factory is placed in /etc/condor/config.d . 00_gwms_factory_general.config 00-restart_peaceful.config 01_gwms_factory_collectors.config 02_gwms_factory_schedds.config 03_gwms_local.config 10-batch_gahp_blahp.config Get rid of the pre-loaded HTCondor default root@host # rm /etc/condor/config.d/00personal_condor.config root@host # touch /etc/condor/config.d/00personal_condor.config For most installations, the items you need to modify are in 03_gwms_factory_local.config . The lines you will have to edit are: Credentials of the machine. You can either run using a proxy, or a service certificate. It is recommended to use a host certificate and specify its location in the variables GSI_DAEMON_CERT and GSI_DAEMON_KEY . The host certificate should be owned by root and have the correct permissions, 600. HTCondor ids in the form UID.GID (both are integers) HTCondor admin email. Will receive messages when services fail. #-- HTCondor user: condor CONDOR_IDS = #-- Contact (via email) when problems occur CONDOR_ADMIN = ############################ # GSI Security config ############################ #-- Grid Certificate directory GSI_DAEMON_TRUSTED_CA_DIR= /etc/grid-security/certificates #-- Credentials GSI_DAEMON_CERT = /etc/grid-security/hostcert.pem GSI_DAEMON_KEY = /etc/grid-security/hostkey.pem #-- HTCondor mapfile CERTIFICATE_MAPFILE= /etc/condor/certs/condor_mapfile ################################### # Whitelist of HTCondor daemon DNs ################################### #DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD Using other HTCondor RPMs, e.g. UW Madison HTCondor RPM The above procedure will work if you are using the OSG HTCondor RPMS. You can verify that you used the OSG HTCondor RPM by using yum list condor . The version name should include \"osg\", e.g. 8.6.9-1.1.osg34.el7 . If you are using the UW Madison HTCondor RPMS, be aware of the following changes: This HTCondor RPM uses a file /etc/condor/condor_config.local to add your local machine slot to the user pool. If you want to disable this behavior (recommended), you should blank out that file or comment out the line in /etc/condor/condor_config for LOCAL_CONFIG_FILE. (Make sure that LOCAL_CONFIG_DIR is set to /etc/condor/config.d ) Note that the variable LOCAL_DIR is set differently in UW Madison and OSG RPMs. This should not cause any more problems in the Glideinwms RPMs, but please take note if you use this variable in your job submissions or other customizations. In general if you are using a non OSG RPM or if you added custom configuration files for HTCondor please check the order of the configuration files: root@host # condor_config_val -config Configuration source: /etc/condor/condor_config Local configuration sources: /etc/condor/config.d/00-restart_peaceful.config /etc/condor/config.d/00_gwms_factory_general.config /etc/condor/config.d/01_gwms_factory_collectors.config /etc/condor/config.d/02_gwms_factory_schedds.config /etc/condor/config.d/03_gwms_local.config /etc/condor/config.d/10-batch_gahp_blahp.config /etc/condor/condor_config.local Restarting HTCondor After configuring HTCondor, be sure to restart HTCondor: root@host # service condor restart Create a HTCondor grid mapfile. The HTCondor grid mapfile /etc/condor/certs/condor_mapfile is used for authentication between the glidein running on a remote worker node, and the local collector. HTCondor uses the mapfile to map certificates to pseudo-users on the local machine. It is important that you map the DN's of each frontend you are talking to. Below is an example mapfile, by default found in /etc/condor/certs/condor_mapfile : GSI \"^\\/DC\\=org\\/DC\\=doegrids\\/OU\\=People\\/CN\\=Some\\ Name\\ 123456$\" frontend GSI (.*) anonymous FS (.*) \\1 Each frontend needs a line that maps to the user specified in the identity argument in the frontend security section of the Factory configuration. Reconfiguring GlideinWMS After changing the configuration of GlideinWMS and making sure that Factory is running, use the following table to find the appropriate command for your operating system (run as root ): If your operating system is... Run the following command... Enterprise Linux 7 systemctl reload gwms-factory Enterprise Linux 6 service gwms-factory reconfig Note Notice that, in the case of Enterprise Linux 7 systemctl reload gwms-factory will work only if: - gwms-factory service is running - gwms-factory service was started with systemctl Otherwise, you will get the following error in any of the cases: # systemctl reload gwms-factory Job for gwms-factory.service invalid. Upgrading GlideinWMS Before you start the Factory service for the first time or after an update of the RPM or after you change GlideinWMS scripts, you should always use the GlideinWMS \"upgrade\" command. To do so: Make sure the condor and gwms-factory services are stopped (in EL6 this will be done for you). Issue the upgrade command: If you are using Enterprise Linux 7: root@host # /usr/sbin/gwms-factory upgrade If you are using Enterprise Linux 6: root@host # service gwms-factory upgrade Start the condor and gwms-factory services (see next part). Service Activation and Deactivation To start the Factory you must start also HTCondor and the Web server beside the Factory itself: # %RED%For RHEL 6 , CentOS 6 , and SL6%ENDCOLOR% root@host # service condor start root@host # service httpd start root@host # service gwms-factory start # %RED% For RHEL 7 , CentOS 7 , and SL7%ENDCOLOR% root@host # systemctl start condor root@host # systemctl start httpd root@host # systemctl start gwms-factory Note Once you successfully start using the Factory service, anytime you change the /etc/gwms-factory/glideinWMS.xml file you will need to run a reconfig/reload command. If you change also some code you need the upgrade command mentioned above: # %RED% For RHEL 6 , CentOS 6 , and SL6%ENDCOLOR% root@host # service gwms-factory reconfig # %RED% But the situation is a bit more complicated in RHEL 7 , CentOS 7 , and SL7 due to systemd restrictions%ENDCOLOR% # %GREEN% For reconfig:%ENDCOLOR% A. %RED% when the Factory is running%ENDCOLOR% A.1 %RED% without any additional options%ENDCOLOR% root@host # /usr/sbin/gwms-factory reconfig%ENDCOLOR% or root@host # systemctl reload gwms-factory A.2 %RED% if you want to give additional options %ENDCOLOR% systemctl stop gwms-factory /usr/sbin/gwms-factory reconfig \"and your options\" systemctl start gwms-factory B. %RED% when the Factory is NOT running %ENDCOLOR% root@host # /usr/sbin/gwms-factory reconfig ( \"and your options\" ) To enable the services so that they restart after a reboot: # %RED%# For RHEL 6 , CentOS 6 , and SL6%ENDCOLOR% root@host # /sbin/chkconfig fetch-crl-cron on root@host # /sbin/chkconfig fetch-crl-boot on root@host # /sbin/chkconfig condor on root@host # /sbin/chkconfig httpd on root@host # /sbin/chkconfig gwms-factory on # %RED%# For RHEL 7 , CentOS 7 , and SL7%ENDCOLOR% root@host # systemctl enable fetch-crl-cron root@host # systemctl enable fetch-crl-boot root@host # systemctl enable condor root@host # systemctl enable httpd root@host # systemctl enable gwms-factory To stop the Factory: # %RED%For RHEL 6 , CentOS 6 , and SL6 %ENDCOLOR% root@host # service gwms-factory stop # %RED%For RHEL 7 , CentOS 7 , and SL7%ENDCOLOR% root@host # systemctl stop gwms-factory And you can stop also the other services if you are not using them independently of the Factory. Validating GlideinWMS Factory The complete validation of the Factory is the submission of actual jobs. You can also check that the services are up and running: root@host # condor_status -any MyType TargetType Name glidefactoryclient None 12345_TEST_ENTRY@gfactory_instance@ glideclient None 12345_TEST_ENTRY@gfactory_instance@ glidefactory None TEST_ENTRY@gfactory_instance@ glidefactoryglobal None gfactory_instance@gfactory_ser glideclientglobal None gfactory_instance@gfactory_ser Scheduler None hostname.fnal.gov DaemonMaster None hostname.fnal.gov Negotiator None hostname.fnal.gov Scheduler None schedd_glideins2@hostname Scheduler None schedd_glideins3@hostname Scheduler None schedd_glideins4@hostname Scheduler None schedd_glideins5@hostname Collector None wmscollector_service@hostname You should have one \"glidefactory\" classAd for each entry that you have enabled. If you have already configured the frontends, you will also have one glidefactoryclient and one glideclient classAd for each frontend / entry. You can check also the monitoring Web page: http://YOUR_HOST_FQDN/factory/monitor/ You can also test the local submission of a job to a resource using the test script local_start.sh but you must first install the OSG client tools and generate a proxy. After that you can run the test (replace ENTRY_NAME with the name of one of the entries in /etc/gwms-factory/glideinWMS.xml ): Check Web server configuration for the monitoring Verify path and specially the URL for the GlideinWMS files served by your web server: stage base_dir = \"/var/lib/gwms-factory/web-area/stage\" use_symlink = \"True\" web_base_url = \"http://HOSTNAME:PORT/factory/stage\" This will determine the location of your web server . Make sure that the URL is visible. Depending on your firewall or the one of your organization, you may need to change the port here and in the httpd configuration (by modifying the \"Listen\" directive in /etc/httpd/conf/httpd.conf ). Note that web servers are an often an attacked piece of infrastruture, so you may want to go through the Apache configuration in /etc/httpd/conf/httpd.conf and disable unneeded modules. Troubleshooting GlideinWMS Factory File Locations File Description File Location Comment Configuration file /etc/gwms-factory/glideinWMS.xml Main configuration file Logs /var/log/gwms-factory/server/factory Overall server logs /var/log/gwms-factory/server/entry_NAME Specific entry logs (generally more useful) /var/log/gwms-factory/client Glidein Pilot logs seperated by user and entry Startup script /etc/init.d/gwms-factory Web Directory /var/lib/gwms-factory/web-area Web Base /var/lib/gwms-factory/web-base Working Directory /var/lib/gwms-factory/work-dir/ Increase the log level and change rotation policies You can increase the log level of the frontend. To add a log file with all the log information add the following line with all the message types in the process_log section of /etc/gwms-factory/glideinWMS.xml : You can also change the rotation policy and choose whether compress the rotated files, all in the same section of the config files: max_bytes is the max size of the log files max_days it will be rotated. compression specifies if rotated files are compressed backup_count is the number of rotated log files kept Further details are in the reference documentation . Failed authentication errors If you get messages such as these in the logs, the Factory does not trust the frontend and will not submit glideins. WARNING: Client fermicloud128-fnal-gov_OSG_gWMSFrontend.main (secid: frontend_name) not in white list. Skipping request This error means that the frontend name in the security section of the Factory does not match the security_name in the frontend. Client fermicloud128-fnal-gov_OSG_gWMSFrontend.main (secid: frontend_name) is not coming from a trusted source; AuthenticatedIdentity vofrontend_condor@fermicloud130.fnal.gov!=vofrontend_factory@fermicloud130.fnal.gov. Skipping for security reasons. This error means that the identity in the security section of the Factory does not match what the /etc/condor/certs/condor_mapfile authenticates the Frontend to in HTCondor (!Authenticated Identity in the classad). Make sure the attributes are correctly lined up as in the Frontend security configuration section above. Glideins start but do not connect to User pool / VO Frontend Check the appropriate job err and out logs in /var/log/gwms-factory/client to see if any errors were reported. Often, this will be a pilot unable to access a web server or with an invalid proxy. Also, verify that the condor_mapfile is correct on the VO Frontend's user pool collector and configuration. Glideins start but fail before running job with error \"Proxy not long lived enough\" If the glideins are running on a resource (entry) but the jobs are not running and the log files in /var/log/gwms-factory/client/user_frontend/glidein_gfactory_instance/ENTRY_NAME report an error like \"Proxy not long lived enough (86096 s left), shortened retire time ...\", then probably the HTCondor RLM on the Compute Element is delegating the proxy and shortening its lifespan. This can be fixed by setting DELEGATE_JOB_GSI_CREDENTIALS = FALSE as suggested in the CE install document . References http://glideinwms.fnal.gov/doc.prd/ https://opensciencegrid.org/docs/other/install-gwms-frontend/","title":"Installing GlideinWMS Factory"},{"location":"services/install-gwms-factory/#glideinwms-factory-installation","text":"This document describes how to install a Glidein Workflow Managment System (GlideinWMS) Factory instance. This document assumes expertise with HTCondor and familiarity with the GlideinWMS software. It does not cover anything but the simplest possible install. Please consult the GlideinWMS reference documentation for advanced topics, including non-root, non-RPM-based installation. In this document the terms glidein and pilot (job) will be used interchangeably. This parts covers these primary components of the GlideinWMS system: WMS Collector / Schedd : A set of condor_collector and condor_schedd processes that allow the submission of pilots to Grid entries. GlideinWMS Factory : The process submitting the pilots when needed Warning We really recommend you to use the OSG provided Factory and not to install your own . A VO Frontend is sufficient to submit your jobs and to decide scheduling policies. And this will avoid for you the complexity to deal directly with grid/cloud sites. If you really need you own Factory be aware that it is a complex component and may require a non trivial maintenance effort.","title":"GlideinWMS Factory Installation"},{"location":"services/install-gwms-factory/#before-starting","text":"Before starting the installation process, consider the following points (consulting the Reference section below as needed):","title":"Before Starting"},{"location":"services/install-gwms-factory/#requirements","text":"","title":"Requirements"},{"location":"services/install-gwms-factory/#host-and-os","text":"A host to install the GlideinWMS Factory (pristine node). Currently most of our testing has been done on Scientific Linux 6 and 7. Root access The GlideinWMS Factory has the following requirements: CPU : 4-8 cores for a large installation (1 should suffice on a small install) RAM : 4-8GB on a large installation (1GB should suffice for small installs) Disk : 10GB will be plenty sufficient for all the binaries, config and log files related to GlideinWMS. If you are a large site with need to keep significant history and logs, you may want to allocate 100GB+ to store long histories.","title":"Host and OS"},{"location":"services/install-gwms-factory/#users","text":"The GlideinWMS Factory installation will create the following users unless they are already created . User Default uid Comment condor none HTCondor user (installed via dependencies). gfactory none This user runs the GlideinWMS VO factory. To verify that the user gfactory has gfactory as primary group check the output of root@host # getent passwd gfactory | cut -d: -f4 | xargs getent group It should be the gfactory group.","title":"Users"},{"location":"services/install-gwms-factory/#certificates","text":"Certificate User that owns certificate Path to certificate Host certificate root /etc/grid-security/hostcert.pem /etc/grid-security/hostkey.pem Here are instructions to request a host certificate. The host certificate/key is used for authorization, however, authorization between the Factory and the GlideinWMS collector is done by file system authentication.","title":"Certificates"},{"location":"services/install-gwms-factory/#networking","text":"","title":"Networking"},{"location":"services/install-gwms-factory/#firewalls","text":"It must be on the public internet, with at least one port open to the world; all worker nodes will load data from this node trough HTTP. Note that worker nodes will also need outbound access in order to access this HTTP port.","title":"Firewalls"},{"location":"services/install-gwms-factory/#installation-procedure","text":"As with all OSG software installations, there are some one-time (per host) steps to prepare in advance: Ensure the host has a supported operating system Obtain root access to the host Prepare the required Yum repositories Install CA certificates","title":"Installation Procedure"},{"location":"services/install-gwms-factory/#installing-htcondor","text":"Most required software is installed from the Factory RPM installation. HTCondor is the only exception since there are many different ways to install it , using the RPM system or not. You need to have HTCondor installed before installing the GlideinWMS Factory. If yum cannot find a HTCondor RPM, it will install the dummy empty-condor RPM, assuming that you installed HTCondor using a tarball distribution. If you don't have HTCondor already installed, you can install the HTCondor RPM from the OSG repository: root@host # yum install condor.x86_64","title":"Installing HTCondor"},{"location":"services/install-gwms-factory/#installing-htcondor-bosco","text":"If you plan to send jobs using direct batch submission (aka BOSCO), then you need also the condor-bosco package. You'll have to install the package and remove one of its files /etc/condor/config.d/60-campus_factory.config because it interferes with the Factory configuration. root@host # yum install condor-bosco root@host # rm /etc/condor/config.d/60-campus_factory.config root@host # touch /etc/condor/config.d/60-campus_factory.config","title":"Installing HTCondor-BOSCO"},{"location":"services/install-gwms-factory/#install-gwms-factory","text":"","title":"Install GWMS Factory"},{"location":"services/install-gwms-factory/#download-and-install-the-factory-rpm","text":"Install the RPM and dependencies (be prepared for a lot of dependencies). root@host # yum install glideinwms-factory This will install the current production release verified and tested by OSG with default HTCondor configuration. This command will install the GlideinWMS Factory, HTCondor, the OSG client, and all the required dependencies. If you wish to install a different version of GlideinWMS, add the \"--enablerepo\" argument to the command as follows: yum install --enablerepo=osg-testing glideinwms-factory : The most recent production release, still in testing phase. This will usually match the current tarball version on the GlideinWMS home page . (The osg-release production version may lag behind the tarball release by a few weeks as it is verified and packaged by OSG). Note that this will also take the osg-testing versions of all dependencies as well. yum install --enablerepo=osg-upcoming glideinwms-factory : The most recent development series release, ie version 3.3.x release. This has newer features such as cloud submission support, but is less tested.","title":"Download and install the Factory RPM"},{"location":"services/install-gwms-factory/#download-htcondor-tarballs","text":"You will need to download HTCondor tarballs for each architecture that you want to deploy pilots on . At this point, GlideinWMS factory does not support pulling HTCondor binaries from your system area. Suggested is that you put these binaries in /var/lib/gwms-factory/condor but any gfactory accessible location should suffice.","title":"Download HTCondor tarballs"},{"location":"services/install-gwms-factory/#configuration-procedure","text":"After installing the RPM you need to configure the components of the GlideinWMS Factory: Edit Factory configuration options Edit HTCondor configuration options Create a HTCondor grid map file Reconfigure and Start Factory","title":"Configuration Procedure"},{"location":"services/install-gwms-factory/#configuring-the-factory","text":"The configuration file is /etc/gwms-factory/glideinWMS.xml . The next steps will describe each line that you will need to edit for most cases, but you may want to review the whole file to be sure that it is configured correctly.","title":"Configuring the Factory"},{"location":"services/install-gwms-factory/#security-configuration","text":"In the security section, you will need to provide each Frontend that is allowed to communicate with the Factory: security key_length=\"2048\" pub_key=\"RSA\" remove_old_cred_age=\"30\" remove_old_cred_freq=\"24\" reuse_oldkey_onstartup_gracetime=\"900\"> These attributes are very important to get exactly right or the Frontend will not be trusted. This should match one of the factory and security sections of the Frontend configuration Configuring the GlideinWMS Frontend in the following way: Note This is a snippet from the Frontend configuration (for reference), not the Factory that you are configuring now! For the factory section: # from frontend.xml .... For the security: # from frontend.xml Note that the identity of the Frontend must match what HTCondor authenticates the DN of the frontend to. In /etc/condor/certs/condor_mapfile , there must be an entry with vofrontend_service definition (in this case): GSI \"^\\/DC\\=org\\/DC\\=doegrids\\/OU\\=Services\\/CN\\=Some\\ Name\\ 834323%ENDCOLOR%$\" % GREEN % vofrontend_service % ENDCOLOR %","title":"Security configuration"},{"location":"services/install-gwms-factory/#entry-configuration","text":"Entries are grid/cloud endpoints (aka Compute Elements, or gatekeepers) that can accept job requests and run pilots (which will run user jobs). Each entry needs to be configured to communicate to a specific gatekeeper. An example test entry is provided in the default GlideinWMS configuration file. At the very least, you will need to modify the entry line: You will need to modify the entry name and gatekeeper . This will determine the gatekeeper that you access. Specific gatekeepers often require specific \"rsl\" attributes that determine the job queue that you are submitting to, or other attributes. Add them in the rsl attribute. Also, be sure to distribute your entries across the various HTCondor schedd work managers to balance load. To see the available schedd use condor_status -schedd -l | grep Name . Several schedd options are configured by default for you: schedd_glideins2, schedd_glideins3, schedd_glideins4, schedd_glideins5 , as well as the default schedd . This can be modified in the HTCondor configuration. Add any specific options, such as limitations on jobs/pilots or glexec/voms requirements in the entry section below the above line. More details are in the GlideinWMS Factory configuration guide . !!! warning If there is no match between auth_metod and trust_domain of the entry and the type and trust_domain listed in one of the credentials of one of the Frontends using this Factory, then no job can run on that entry. The Factory must advertise the correct Resource Name of each entry for accounting purposes. Then the Factory must also advertise in the entry all the attributes that will allow to match the query expression used in the Frontends connecting to this Factory (e.g. as explained in the VO frontend configuration document ). Note Keep an eye on this part as we're dealing with singularity. Then you must advertise correctly if the site supports gLExec . If it does not set GLEXEC_BIN to NONE , if gLExec is installed via OSG set it to OSG , otherwise set it to the path of gLExec. For example this snippet advertises GLIDEIN_Supported_VOs attribute with the supported VO so that can be used with the query above in the VO frontend and says that the resource does not support gLExec: ... ... Note Specially if jobs are sent to OSG resources, it is very important to set the GLIDEIN_Resource_Name and to be consistent with the Resource Name reported in OIM because that name will be used for job accounting in Gratia. It should be the name of the Resource in OIM or the name of the Resource Group (specially if there are many gatekeepers submitting to the same cluster). More information on options can be found here","title":"Entry configuration"},{"location":"services/install-gwms-factory/#configuring-tarballs","text":"Each pilot will download HTCondor binaries from the staging area. Often, multiple binaries are needed to support various architectures and platforms. Currently, you will need to provide at least one tarball for GlideinWMS to use. (Using the system binaries is currently not supported). Download a HTCondor tarball from here . Suggested is to put the binaries in /var/lib/gwms-factory/condor , but any factory-accessible location will do just fine. Once you have downloaded the tarball, configure it in /etc/gwms-factory/glideinWMS.xml like in the following: Remember also to modify the condor_os and condor_arch attributes in the entries (the configured Compute Elements) to pick the correct HTCondor binary. Here are more details on using multiple HTCondor binaries. Note that is sufficient to set the base_dir ; the reconfigure command will prepare the tarball and add it to the XML config file.","title":"Configuring Tarballs"},{"location":"services/install-gwms-factory/#configuring-htcondor","text":"The HTCondor configuration for the Factory is placed in /etc/condor/config.d . 00_gwms_factory_general.config 00-restart_peaceful.config 01_gwms_factory_collectors.config 02_gwms_factory_schedds.config 03_gwms_local.config 10-batch_gahp_blahp.config Get rid of the pre-loaded HTCondor default root@host # rm /etc/condor/config.d/00personal_condor.config root@host # touch /etc/condor/config.d/00personal_condor.config For most installations, the items you need to modify are in 03_gwms_factory_local.config . The lines you will have to edit are: Credentials of the machine. You can either run using a proxy, or a service certificate. It is recommended to use a host certificate and specify its location in the variables GSI_DAEMON_CERT and GSI_DAEMON_KEY . The host certificate should be owned by root and have the correct permissions, 600. HTCondor ids in the form UID.GID (both are integers) HTCondor admin email. Will receive messages when services fail. #-- HTCondor user: condor CONDOR_IDS = #-- Contact (via email) when problems occur CONDOR_ADMIN = ############################ # GSI Security config ############################ #-- Grid Certificate directory GSI_DAEMON_TRUSTED_CA_DIR= /etc/grid-security/certificates #-- Credentials GSI_DAEMON_CERT = /etc/grid-security/hostcert.pem GSI_DAEMON_KEY = /etc/grid-security/hostkey.pem #-- HTCondor mapfile CERTIFICATE_MAPFILE= /etc/condor/certs/condor_mapfile ################################### # Whitelist of HTCondor daemon DNs ################################### #DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD","title":"Configuring HTCondor"},{"location":"services/install-gwms-factory/#using-other-htcondor-rpms-eg-uw-madison-htcondor-rpm","text":"The above procedure will work if you are using the OSG HTCondor RPMS. You can verify that you used the OSG HTCondor RPM by using yum list condor . The version name should include \"osg\", e.g. 8.6.9-1.1.osg34.el7 . If you are using the UW Madison HTCondor RPMS, be aware of the following changes: This HTCondor RPM uses a file /etc/condor/condor_config.local to add your local machine slot to the user pool. If you want to disable this behavior (recommended), you should blank out that file or comment out the line in /etc/condor/condor_config for LOCAL_CONFIG_FILE. (Make sure that LOCAL_CONFIG_DIR is set to /etc/condor/config.d ) Note that the variable LOCAL_DIR is set differently in UW Madison and OSG RPMs. This should not cause any more problems in the Glideinwms RPMs, but please take note if you use this variable in your job submissions or other customizations. In general if you are using a non OSG RPM or if you added custom configuration files for HTCondor please check the order of the configuration files: root@host # condor_config_val -config Configuration source: /etc/condor/condor_config Local configuration sources: /etc/condor/config.d/00-restart_peaceful.config /etc/condor/config.d/00_gwms_factory_general.config /etc/condor/config.d/01_gwms_factory_collectors.config /etc/condor/config.d/02_gwms_factory_schedds.config /etc/condor/config.d/03_gwms_local.config /etc/condor/config.d/10-batch_gahp_blahp.config /etc/condor/condor_config.local","title":"Using other HTCondor RPMs, e.g. UW Madison HTCondor RPM"},{"location":"services/install-gwms-factory/#restarting-htcondor","text":"After configuring HTCondor, be sure to restart HTCondor: root@host # service condor restart","title":"Restarting HTCondor"},{"location":"services/install-gwms-factory/#create-a-htcondor-grid-mapfile","text":"The HTCondor grid mapfile /etc/condor/certs/condor_mapfile is used for authentication between the glidein running on a remote worker node, and the local collector. HTCondor uses the mapfile to map certificates to pseudo-users on the local machine. It is important that you map the DN's of each frontend you are talking to. Below is an example mapfile, by default found in /etc/condor/certs/condor_mapfile : GSI \"^\\/DC\\=org\\/DC\\=doegrids\\/OU\\=People\\/CN\\=Some\\ Name\\ 123456$\" frontend GSI (.*) anonymous FS (.*) \\1 Each frontend needs a line that maps to the user specified in the identity argument in the frontend security section of the Factory configuration.","title":"Create a HTCondor grid mapfile."},{"location":"services/install-gwms-factory/#reconfiguring-glideinwms","text":"After changing the configuration of GlideinWMS and making sure that Factory is running, use the following table to find the appropriate command for your operating system (run as root ): If your operating system is... Run the following command... Enterprise Linux 7 systemctl reload gwms-factory Enterprise Linux 6 service gwms-factory reconfig Note Notice that, in the case of Enterprise Linux 7 systemctl reload gwms-factory will work only if: - gwms-factory service is running - gwms-factory service was started with systemctl Otherwise, you will get the following error in any of the cases: # systemctl reload gwms-factory Job for gwms-factory.service invalid.","title":"Reconfiguring GlideinWMS"},{"location":"services/install-gwms-factory/#upgrading-glideinwms","text":"Before you start the Factory service for the first time or after an update of the RPM or after you change GlideinWMS scripts, you should always use the GlideinWMS \"upgrade\" command. To do so: Make sure the condor and gwms-factory services are stopped (in EL6 this will be done for you). Issue the upgrade command: If you are using Enterprise Linux 7: root@host # /usr/sbin/gwms-factory upgrade If you are using Enterprise Linux 6: root@host # service gwms-factory upgrade Start the condor and gwms-factory services (see next part).","title":"Upgrading GlideinWMS"},{"location":"services/install-gwms-factory/#service-activation-and-deactivation","text":"To start the Factory you must start also HTCondor and the Web server beside the Factory itself: # %RED%For RHEL 6 , CentOS 6 , and SL6%ENDCOLOR% root@host # service condor start root@host # service httpd start root@host # service gwms-factory start # %RED% For RHEL 7 , CentOS 7 , and SL7%ENDCOLOR% root@host # systemctl start condor root@host # systemctl start httpd root@host # systemctl start gwms-factory Note Once you successfully start using the Factory service, anytime you change the /etc/gwms-factory/glideinWMS.xml file you will need to run a reconfig/reload command. If you change also some code you need the upgrade command mentioned above: # %RED% For RHEL 6 , CentOS 6 , and SL6%ENDCOLOR% root@host # service gwms-factory reconfig # %RED% But the situation is a bit more complicated in RHEL 7 , CentOS 7 , and SL7 due to systemd restrictions%ENDCOLOR% # %GREEN% For reconfig:%ENDCOLOR% A. %RED% when the Factory is running%ENDCOLOR% A.1 %RED% without any additional options%ENDCOLOR% root@host # /usr/sbin/gwms-factory reconfig%ENDCOLOR% or root@host # systemctl reload gwms-factory A.2 %RED% if you want to give additional options %ENDCOLOR% systemctl stop gwms-factory /usr/sbin/gwms-factory reconfig \"and your options\" systemctl start gwms-factory B. %RED% when the Factory is NOT running %ENDCOLOR% root@host # /usr/sbin/gwms-factory reconfig ( \"and your options\" ) To enable the services so that they restart after a reboot: # %RED%# For RHEL 6 , CentOS 6 , and SL6%ENDCOLOR% root@host # /sbin/chkconfig fetch-crl-cron on root@host # /sbin/chkconfig fetch-crl-boot on root@host # /sbin/chkconfig condor on root@host # /sbin/chkconfig httpd on root@host # /sbin/chkconfig gwms-factory on # %RED%# For RHEL 7 , CentOS 7 , and SL7%ENDCOLOR% root@host # systemctl enable fetch-crl-cron root@host # systemctl enable fetch-crl-boot root@host # systemctl enable condor root@host # systemctl enable httpd root@host # systemctl enable gwms-factory To stop the Factory: # %RED%For RHEL 6 , CentOS 6 , and SL6 %ENDCOLOR% root@host # service gwms-factory stop # %RED%For RHEL 7 , CentOS 7 , and SL7%ENDCOLOR% root@host # systemctl stop gwms-factory And you can stop also the other services if you are not using them independently of the Factory.","title":"Service Activation and Deactivation"},{"location":"services/install-gwms-factory/#validating-glideinwms-factory","text":"The complete validation of the Factory is the submission of actual jobs. You can also check that the services are up and running: root@host # condor_status -any MyType TargetType Name glidefactoryclient None 12345_TEST_ENTRY@gfactory_instance@ glideclient None 12345_TEST_ENTRY@gfactory_instance@ glidefactory None TEST_ENTRY@gfactory_instance@ glidefactoryglobal None gfactory_instance@gfactory_ser glideclientglobal None gfactory_instance@gfactory_ser Scheduler None hostname.fnal.gov DaemonMaster None hostname.fnal.gov Negotiator None hostname.fnal.gov Scheduler None schedd_glideins2@hostname Scheduler None schedd_glideins3@hostname Scheduler None schedd_glideins4@hostname Scheduler None schedd_glideins5@hostname Collector None wmscollector_service@hostname You should have one \"glidefactory\" classAd for each entry that you have enabled. If you have already configured the frontends, you will also have one glidefactoryclient and one glideclient classAd for each frontend / entry. You can check also the monitoring Web page: http://YOUR_HOST_FQDN/factory/monitor/ You can also test the local submission of a job to a resource using the test script local_start.sh but you must first install the OSG client tools and generate a proxy. After that you can run the test (replace ENTRY_NAME with the name of one of the entries in /etc/gwms-factory/glideinWMS.xml ):","title":"Validating GlideinWMS Factory"},{"location":"services/install-gwms-factory/#check-web-server-configuration-for-the-monitoring","text":"Verify path and specially the URL for the GlideinWMS files served by your web server: stage base_dir = \"/var/lib/gwms-factory/web-area/stage\" use_symlink = \"True\" web_base_url = \"http://HOSTNAME:PORT/factory/stage\" This will determine the location of your web server . Make sure that the URL is visible. Depending on your firewall or the one of your organization, you may need to change the port here and in the httpd configuration (by modifying the \"Listen\" directive in /etc/httpd/conf/httpd.conf ). Note that web servers are an often an attacked piece of infrastruture, so you may want to go through the Apache configuration in /etc/httpd/conf/httpd.conf and disable unneeded modules.","title":"Check Web server configuration for the monitoring"},{"location":"services/install-gwms-factory/#troubleshooting-glideinwms-factory","text":"","title":"Troubleshooting GlideinWMS Factory"},{"location":"services/install-gwms-factory/#file-locations","text":"File Description File Location Comment Configuration file /etc/gwms-factory/glideinWMS.xml Main configuration file Logs /var/log/gwms-factory/server/factory Overall server logs /var/log/gwms-factory/server/entry_NAME Specific entry logs (generally more useful) /var/log/gwms-factory/client Glidein Pilot logs seperated by user and entry Startup script /etc/init.d/gwms-factory Web Directory /var/lib/gwms-factory/web-area Web Base /var/lib/gwms-factory/web-base Working Directory /var/lib/gwms-factory/work-dir/","title":"File Locations"},{"location":"services/install-gwms-factory/#increase-the-log-level-and-change-rotation-policies","text":"You can increase the log level of the frontend. To add a log file with all the log information add the following line with all the message types in the process_log section of /etc/gwms-factory/glideinWMS.xml : You can also change the rotation policy and choose whether compress the rotated files, all in the same section of the config files: max_bytes is the max size of the log files max_days it will be rotated. compression specifies if rotated files are compressed backup_count is the number of rotated log files kept Further details are in the reference documentation .","title":"Increase the log level and change rotation policies"},{"location":"services/install-gwms-factory/#failed-authentication-errors","text":"If you get messages such as these in the logs, the Factory does not trust the frontend and will not submit glideins. WARNING: Client fermicloud128-fnal-gov_OSG_gWMSFrontend.main (secid: frontend_name) not in white list. Skipping request This error means that the frontend name in the security section of the Factory does not match the security_name in the frontend. Client fermicloud128-fnal-gov_OSG_gWMSFrontend.main (secid: frontend_name) is not coming from a trusted source; AuthenticatedIdentity vofrontend_condor@fermicloud130.fnal.gov!=vofrontend_factory@fermicloud130.fnal.gov. Skipping for security reasons. This error means that the identity in the security section of the Factory does not match what the /etc/condor/certs/condor_mapfile authenticates the Frontend to in HTCondor (!Authenticated Identity in the classad). Make sure the attributes are correctly lined up as in the Frontend security configuration section above.","title":"Failed authentication errors"},{"location":"services/install-gwms-factory/#glideins-start-but-do-not-connect-to-user-pool-vo-frontend","text":"Check the appropriate job err and out logs in /var/log/gwms-factory/client to see if any errors were reported. Often, this will be a pilot unable to access a web server or with an invalid proxy. Also, verify that the condor_mapfile is correct on the VO Frontend's user pool collector and configuration.","title":"Glideins start but do not connect to User pool / VO Frontend"},{"location":"services/install-gwms-factory/#glideins-start-but-fail-before-running-job-with-error-proxy-not-long-lived-enough","text":"If the glideins are running on a resource (entry) but the jobs are not running and the log files in /var/log/gwms-factory/client/user_frontend/glidein_gfactory_instance/ENTRY_NAME report an error like \"Proxy not long lived enough (86096 s left), shortened retire time ...\", then probably the HTCondor RLM on the Compute Element is delegating the proxy and shortening its lifespan. This can be fixed by setting DELEGATE_JOB_GSI_CREDENTIALS = FALSE as suggested in the CE install document .","title":"Glideins start but fail before running job with error \"Proxy not long lived enough\""},{"location":"services/install-gwms-factory/#references","text":"http://glideinwms.fnal.gov/doc.prd/ https://opensciencegrid.org/docs/other/install-gwms-frontend/","title":"References"},{"location":"services/sending-announcements/","text":"Sending Announcements Various OSG teams need to send out announcement about various events (releases, security advisories, planned changes, etc). This page describes how to send announcements using the osg-notify tool. Prerequisites To send announcements, the following conditions must be met: A host with an IP address listed in the SPF Record A sufficiently modern Linux operating system. This procedure has been tested on a FermiCloud Scientific Linux 7 VM and a Linux Mint 18.3 laptop. It is known not to work on a FermiCloud Scientific Linux 6 VM. A valid OSG user certificate to lookup contacts in the topology database Local hostname matches DNS DNS forward and reverse lookups in place [tim@submit-1 topology]$ hostname submit-1.chtc.wisc.edu [tim@submit-1 topology]$ host submit-1.chtc.wisc.edu submit-1.chtc.wisc.edu has address 128.105.244.191 [tim@submit-1 topology]$ host 128 .105.244.191 191.244.105.128.in-addr.arpa domain name pointer submit-1.chtc.wisc.edu. (Required for security announcements) A GPG Key to sign the announcement Installation Install the required Yum repositories : Install the OSG tools: # yum install --enablerepo = devops topology-client If you are on a FermiCloud VM, update postfix to relay through FermiLab's official mail server: echo \"transport_maps = hash:/etc/postfix/transport\" >> /etc/postfix/main.cf echo \"* smtp:smtp.fnal.gov\" >> /etc/postfix/transport postmap hash:/etc/postfix/transport postfix reload Test this setup by sending a message to yourself only. Bonus points for using an email address that goes to a site with aggressive SPAM filtering. Sending the announcement Use the osg-notify tool to send the announcement using the relevant options from the following table: Option Description --dry-run Use this option until you are ready to actually send the message --cert File that contains your OSG User Certificate --key File that contains your Private Key for your OSG User Certificate --no-sign Don't GPG sign the message (release only) --type production Not a test message --message File containing your message --subject The subject of your message --recipients List of recipient email addresses, must have at least one --oim-recipients Select contacts associated with resources and/or VOs --oim-contact-type Replacing with administrative for release announcements or security for security announcements --bypass-dns-check Use this option to skip the check that one of the host's IP addresses matches with the hostname resolution Security requirements Security announcements must be signed using the following options: --sign : GPG sign the message --sign-id : The ID of the key used for singing --from security : The mail comes from the OSG Security Team For release announcements use the following command: osg-notify --cert your-cert.pem --key your-key.pem \\ --no-sign --type production --message \\ --subject '' \\ --recipients \"osg-general@opensciencegrid.org osg-operations@opensciencegrid.org osg-sites@opensciencegrid.org vdt-discuss@opensciencegrid.org\" \\ --oim-recipients resources --oim-recipients vos --oim-contact-type administrative Replacing with an appropriate subject for your announcement and with the path to the file containing your message in plain text.","title":"Sending Announcements"},{"location":"services/sending-announcements/#sending-announcements","text":"Various OSG teams need to send out announcement about various events (releases, security advisories, planned changes, etc). This page describes how to send announcements using the osg-notify tool.","title":"Sending Announcements"},{"location":"services/sending-announcements/#prerequisites","text":"To send announcements, the following conditions must be met: A host with an IP address listed in the SPF Record A sufficiently modern Linux operating system. This procedure has been tested on a FermiCloud Scientific Linux 7 VM and a Linux Mint 18.3 laptop. It is known not to work on a FermiCloud Scientific Linux 6 VM. A valid OSG user certificate to lookup contacts in the topology database Local hostname matches DNS DNS forward and reverse lookups in place [tim@submit-1 topology]$ hostname submit-1.chtc.wisc.edu [tim@submit-1 topology]$ host submit-1.chtc.wisc.edu submit-1.chtc.wisc.edu has address 128.105.244.191 [tim@submit-1 topology]$ host 128 .105.244.191 191.244.105.128.in-addr.arpa domain name pointer submit-1.chtc.wisc.edu. (Required for security announcements) A GPG Key to sign the announcement","title":"Prerequisites"},{"location":"services/sending-announcements/#installation","text":"Install the required Yum repositories : Install the OSG tools: # yum install --enablerepo = devops topology-client If you are on a FermiCloud VM, update postfix to relay through FermiLab's official mail server: echo \"transport_maps = hash:/etc/postfix/transport\" >> /etc/postfix/main.cf echo \"* smtp:smtp.fnal.gov\" >> /etc/postfix/transport postmap hash:/etc/postfix/transport postfix reload Test this setup by sending a message to yourself only. Bonus points for using an email address that goes to a site with aggressive SPAM filtering.","title":"Installation"},{"location":"services/sending-announcements/#sending-the-announcement","text":"Use the osg-notify tool to send the announcement using the relevant options from the following table: Option Description --dry-run Use this option until you are ready to actually send the message --cert File that contains your OSG User Certificate --key File that contains your Private Key for your OSG User Certificate --no-sign Don't GPG sign the message (release only) --type production Not a test message --message File containing your message --subject The subject of your message --recipients List of recipient email addresses, must have at least one --oim-recipients Select contacts associated with resources and/or VOs --oim-contact-type Replacing with administrative for release announcements or security for security announcements --bypass-dns-check Use this option to skip the check that one of the host's IP addresses matches with the hostname resolution Security requirements Security announcements must be signed using the following options: --sign : GPG sign the message --sign-id : The ID of the key used for singing --from security : The mail comes from the OSG Security Team For release announcements use the following command: osg-notify --cert your-cert.pem --key your-key.pem \\ --no-sign --type production --message \\ --subject '' \\ --recipients \"osg-general@opensciencegrid.org osg-operations@opensciencegrid.org osg-sites@opensciencegrid.org vdt-discuss@opensciencegrid.org\" \\ --oim-recipients resources --oim-recipients vos --oim-contact-type administrative Replacing with an appropriate subject for your announcement and with the path to the file containing your message in plain text.","title":"Sending the announcement"},{"location":"services/topology-contacts-data/","text":"Topology and Contacts Data This is internal documentation intended for OSG Operations staff. It contains information about the data provided by https://topology.opensciencegrid.org . The topology data for the service is in https://github.com/opensciencegrid/topology , in the projects/ , topology/ , and virtual-organizations/ subdirectories. The contacts data is in https://bitbucket.org/opensciencegrid/contact/ , in contacts.yaml . Topology Data Admins may request changes to data in the topology repo via either a GitHub pull request or a Freshdesk ticket. These changes can be to a project, a VO, or a resource. The registration document and topology README document should tell them how to do that. In the case of a GitHub pull request, you will need to provide IDs using the bin/next_ids tool in an up-to-date local clone of Topology and potentially fix-up other data. To assist the user, do one of the following, depending on the severity of the fixes required for the PR: For minor issues, submit a \"Comment\" review using GitHub suggestions and ask the user to incorporate your suggestions . For major issues, create a branch based off of their PR, make changes, and submit your own PR that closes the original user's PR. The CI checks should catch most errors but you should still review the YAML changes. Certain things to check are: Do contact names and IDs match what's in the contacts data? (See below for instructions on how to get that information.) If the person is not in the contacts data, you will need to add them before approving the PR. Is the PR submitter authorized to make changes to that project/VO/resource? Can you match them to a person affiliated with that project/VO/site? (The contacts data now includes the GitHub usernames for some people. See below for instructions on how to get that information.) Is their GitHub ID registered in the contact database and are they associated with the relevant resource, site, facility, or VO? Retiring resources A resource can be disabled in its topology yaml file by setting Active: false . However the resource entry should not be immediately deleted from the yaml file. One reason for this is that the WLCG accounting info configured for resources is used to determine which resources to send APEL numbers for. Removing resources prematurely could prevent resummarized GRACC data from getting sent appropriately. Resources that have been inactive for at least two years are eligible to be deleted from the topology database. The GRACC records for this resource can be inspected in Kibana . In the search bar, enter ProbeName:*\\:FQDN in the search bar, where FQDN is the FQDN defined for your resource For example, if your resource FQDN is cmsgrid01.hep.wisc.edu you would enter ProbeName:*\\:cmsgrid01.hep.wisc.edu In the upper-right corner, use the Time Range selection to pick \"Last 2 years\" With this criteria selected, Kibana will show you if it has received any records for this resource in the past two years. If there are no records returned, you may remove the resource from the resource group yaml file in the topology repo. Any downtime entries for this resource in the corresponding downtime yaml file for the resource group must be removed also. If you remove the last resource in the resource group yaml file, you should remove the resource group and corresponding downtime yaml files as well. Reviewing project PRs New projects are typically created by the Research Facilitation team. Here are a few things to check: Did osg-bot warn about a \"New Organization\"? If so, search around in the projects directory and make sure the \"Organization\" in the YAML is not a typo or alternate spelling for an existing organization. grep around in the /projects/ directory for substrings of the organization. For example, if the new org is \"University of Wisconsin Madison\", do: $ grep -i wisconsin projects/*.yaml and you will see that it's supposed to be \"University of Wisconsin-Madison\". If the new organization is not a typo or alternate spelling, dismiss osg-bot's review with the comment \"new org is legit\". Is the project name is of the form _ , e.g. UWMadison_Parks ? (This is recommended but not required for new projects.) If so: Is the short name -> organization mapping for the institution in /mappings/project_institution.yaml (e.g. UWMadison: \"University of Wisconsin-Madison\" )? If not, ask the PR author to add it. Does the \"FieldOfScience\" in the YAML match one of the keys in /mappings/nsfscience.yaml ? (The list is also available on the left column of this CSV .) Is the \"Sponsor\" correct? The sponsor depends on where the users will be submitting jobs from: If they primarily submit from some CI Connect interface such as \"OSG Connect\", use: Sponsor : CampusGrid : Name : The campus grid name must be one of the ones in the /projects/_CAMPUS_GRIDS.yaml file.. Otherwise, the project must be sponsored by a VO: Sponsor : VirtualOrganization : Name : The VO name must be one of the ones in the /virtual-organizations/ dir. Contacts Data The OSG keeps contact data for administrators and maintainers of OSG resources and VOs for the purpose of distributing security, software, and adminstrative (e.g., OSG All-Hands dates) announcements. Additionally, OSG contacts have the following abilities: View other contacts' information (via HTML and XML ) with a registered certificate Register resource downtimes for resources that they are listed as an administrative contact, if they have a registered GitHub ID Contact data is kept as editable YAML in https://bitbucket.org/opensciencegrid/contact/ , in contacts.yaml . The YAML file contains sensitive information and is only visible to people with access to that repo. Getting access to the contact repo The contacts repo is hosted on BitBucket. You will need an Atlassian account for access to BitBucket. The account you use for OSG JIRA should work. Once you have an account, request access from Brian Lin, Mat Selmeci, or Derek Weitzel. You should then be able to go to https://bitbucket.org/opensciencegrid/contact/ . Using the contact repo BitBucket is similar to GitHub except you don't make a fork of the contact repo, you just clone it to your local machine. This means that any pushes go directly to the main repo instead of your own fork. Danger Don't push to master. For any changes, always create your own branch, push your changes to that branch, then make a pull request. Have someone else review and merge your pull request. All contact data is stored in contacts.yaml . The contact info is keyed by a 40-character hexadecimal ID which was generated from their email address when they were first added. An example entry is: 25357f62c7ab2ae11ddda1efd272bb5435dbfacb : # ^ this is their ID FullName : Example A. User Profile : This is an example user. GitHub : ExampleUser # ContactInformation data requires authorization to view ContactInformation : DNs : - ... IM : ... PrimaryEmail : user@example.net PrimaryPhone : ... When making changes to the contact data, first see if a contact is already in the YAML file. Search the YAML file for their name. Be sure to try variations of their name if you don't find them -- someone may be listed as \"Dave\" or \"David\", or have a middle name or middle initial. Follow the instructions below for adding or updating a contact, as appropriate. Adding a new contact Danger Any new contacts need to have their association with the OSG verified by a known contact within the relevant VO, site, or project. When registering a new contact, first obtain the required contact information . After obtaining this information and verifying their association with the OSG, fill out the values in template-contacts.yaml and add it to contacts.yaml . To get the hash used as the ID, run email-hash on their email address. For example: $ cd contact # this is your local clone of the \"contact\" repo $ bin/email-hash user@example.net 25357f62c7ab2ae11ddda1efd272bb5435dbfacb Then your new entry will look like 25357f62c7ab2ae11ddda1efd272bb5435dbfacb : FullName : Example A. User .... The FullName and Profile fields in the main section, and the PrimaryEmail field in the ContactInformation section are required. The PrimaryEmail field in the ContactInformation section should match the hash that you used for the ID. In addition, if they will be making pull requests against the topology repo, e.g. for updating site information, reporting downtime, or updating project or VO information, obtain their GitHub username and put it in the GitHub field. Editing a contact Once you have found a contact in the YAML file, edit the attributes by hand. If you want to add information that is not present for that contact, look at template-contacts.yaml to find out what the attributes are called. Note The ID of the contact never changes, even if the user's PrimaryEmail changes. Important If you change the contact's FullName , you must make the same change to every place that the contact is mentioned in the topology repo. Get the contact changes merged in first.","title":"Topology and Contacts Data"},{"location":"services/topology-contacts-data/#topology-and-contacts-data","text":"This is internal documentation intended for OSG Operations staff. It contains information about the data provided by https://topology.opensciencegrid.org . The topology data for the service is in https://github.com/opensciencegrid/topology , in the projects/ , topology/ , and virtual-organizations/ subdirectories. The contacts data is in https://bitbucket.org/opensciencegrid/contact/ , in contacts.yaml .","title":"Topology and Contacts Data"},{"location":"services/topology-contacts-data/#topology-data","text":"Admins may request changes to data in the topology repo via either a GitHub pull request or a Freshdesk ticket. These changes can be to a project, a VO, or a resource. The registration document and topology README document should tell them how to do that. In the case of a GitHub pull request, you will need to provide IDs using the bin/next_ids tool in an up-to-date local clone of Topology and potentially fix-up other data. To assist the user, do one of the following, depending on the severity of the fixes required for the PR: For minor issues, submit a \"Comment\" review using GitHub suggestions and ask the user to incorporate your suggestions . For major issues, create a branch based off of their PR, make changes, and submit your own PR that closes the original user's PR. The CI checks should catch most errors but you should still review the YAML changes. Certain things to check are: Do contact names and IDs match what's in the contacts data? (See below for instructions on how to get that information.) If the person is not in the contacts data, you will need to add them before approving the PR. Is the PR submitter authorized to make changes to that project/VO/resource? Can you match them to a person affiliated with that project/VO/site? (The contacts data now includes the GitHub usernames for some people. See below for instructions on how to get that information.) Is their GitHub ID registered in the contact database and are they associated with the relevant resource, site, facility, or VO?","title":"Topology Data"},{"location":"services/topology-contacts-data/#retiring-resources","text":"A resource can be disabled in its topology yaml file by setting Active: false . However the resource entry should not be immediately deleted from the yaml file. One reason for this is that the WLCG accounting info configured for resources is used to determine which resources to send APEL numbers for. Removing resources prematurely could prevent resummarized GRACC data from getting sent appropriately. Resources that have been inactive for at least two years are eligible to be deleted from the topology database. The GRACC records for this resource can be inspected in Kibana . In the search bar, enter ProbeName:*\\:FQDN in the search bar, where FQDN is the FQDN defined for your resource For example, if your resource FQDN is cmsgrid01.hep.wisc.edu you would enter ProbeName:*\\:cmsgrid01.hep.wisc.edu In the upper-right corner, use the Time Range selection to pick \"Last 2 years\" With this criteria selected, Kibana will show you if it has received any records for this resource in the past two years. If there are no records returned, you may remove the resource from the resource group yaml file in the topology repo. Any downtime entries for this resource in the corresponding downtime yaml file for the resource group must be removed also. If you remove the last resource in the resource group yaml file, you should remove the resource group and corresponding downtime yaml files as well.","title":"Retiring resources"},{"location":"services/topology-contacts-data/#reviewing-project-prs","text":"New projects are typically created by the Research Facilitation team. Here are a few things to check: Did osg-bot warn about a \"New Organization\"? If so, search around in the projects directory and make sure the \"Organization\" in the YAML is not a typo or alternate spelling for an existing organization. grep around in the /projects/ directory for substrings of the organization. For example, if the new org is \"University of Wisconsin Madison\", do: $ grep -i wisconsin projects/*.yaml and you will see that it's supposed to be \"University of Wisconsin-Madison\". If the new organization is not a typo or alternate spelling, dismiss osg-bot's review with the comment \"new org is legit\". Is the project name is of the form _ , e.g. UWMadison_Parks ? (This is recommended but not required for new projects.) If so: Is the short name -> organization mapping for the institution in /mappings/project_institution.yaml (e.g. UWMadison: \"University of Wisconsin-Madison\" )? If not, ask the PR author to add it. Does the \"FieldOfScience\" in the YAML match one of the keys in /mappings/nsfscience.yaml ? (The list is also available on the left column of this CSV .) Is the \"Sponsor\" correct? The sponsor depends on where the users will be submitting jobs from: If they primarily submit from some CI Connect interface such as \"OSG Connect\", use: Sponsor : CampusGrid : Name : The campus grid name must be one of the ones in the /projects/_CAMPUS_GRIDS.yaml file.. Otherwise, the project must be sponsored by a VO: Sponsor : VirtualOrganization : Name : The VO name must be one of the ones in the /virtual-organizations/ dir.","title":"Reviewing project PRs"},{"location":"services/topology-contacts-data/#contacts-data","text":"The OSG keeps contact data for administrators and maintainers of OSG resources and VOs for the purpose of distributing security, software, and adminstrative (e.g., OSG All-Hands dates) announcements. Additionally, OSG contacts have the following abilities: View other contacts' information (via HTML and XML ) with a registered certificate Register resource downtimes for resources that they are listed as an administrative contact, if they have a registered GitHub ID Contact data is kept as editable YAML in https://bitbucket.org/opensciencegrid/contact/ , in contacts.yaml . The YAML file contains sensitive information and is only visible to people with access to that repo.","title":"Contacts Data"},{"location":"services/topology-contacts-data/#getting-access-to-the-contact-repo","text":"The contacts repo is hosted on BitBucket. You will need an Atlassian account for access to BitBucket. The account you use for OSG JIRA should work. Once you have an account, request access from Brian Lin, Mat Selmeci, or Derek Weitzel. You should then be able to go to https://bitbucket.org/opensciencegrid/contact/ .","title":"Getting access to the contact repo"},{"location":"services/topology-contacts-data/#using-the-contact-repo","text":"BitBucket is similar to GitHub except you don't make a fork of the contact repo, you just clone it to your local machine. This means that any pushes go directly to the main repo instead of your own fork. Danger Don't push to master. For any changes, always create your own branch, push your changes to that branch, then make a pull request. Have someone else review and merge your pull request. All contact data is stored in contacts.yaml . The contact info is keyed by a 40-character hexadecimal ID which was generated from their email address when they were first added. An example entry is: 25357f62c7ab2ae11ddda1efd272bb5435dbfacb : # ^ this is their ID FullName : Example A. User Profile : This is an example user. GitHub : ExampleUser # ContactInformation data requires authorization to view ContactInformation : DNs : - ... IM : ... PrimaryEmail : user@example.net PrimaryPhone : ... When making changes to the contact data, first see if a contact is already in the YAML file. Search the YAML file for their name. Be sure to try variations of their name if you don't find them -- someone may be listed as \"Dave\" or \"David\", or have a middle name or middle initial. Follow the instructions below for adding or updating a contact, as appropriate.","title":"Using the contact repo"},{"location":"services/topology-contacts-data/#adding-a-new-contact","text":"Danger Any new contacts need to have their association with the OSG verified by a known contact within the relevant VO, site, or project. When registering a new contact, first obtain the required contact information . After obtaining this information and verifying their association with the OSG, fill out the values in template-contacts.yaml and add it to contacts.yaml . To get the hash used as the ID, run email-hash on their email address. For example: $ cd contact # this is your local clone of the \"contact\" repo $ bin/email-hash user@example.net 25357f62c7ab2ae11ddda1efd272bb5435dbfacb Then your new entry will look like 25357f62c7ab2ae11ddda1efd272bb5435dbfacb : FullName : Example A. User .... The FullName and Profile fields in the main section, and the PrimaryEmail field in the ContactInformation section are required. The PrimaryEmail field in the ContactInformation section should match the hash that you used for the ID. In addition, if they will be making pull requests against the topology repo, e.g. for updating site information, reporting downtime, or updating project or VO information, obtain their GitHub username and put it in the GitHub field.","title":"Adding a new contact"},{"location":"services/topology-contacts-data/#editing-a-contact","text":"Once you have found a contact in the YAML file, edit the attributes by hand. If you want to add information that is not present for that contact, look at template-contacts.yaml to find out what the attributes are called. Note The ID of the contact never changes, even if the user's PrimaryEmail changes. Important If you change the contact's FullName , you must make the same change to every place that the contact is mentioned in the topology repo. Get the contact changes merged in first.","title":"Editing a contact"},{"location":"services/topology/","text":"Topology Service This document contains information about the service that runs: https://topology.opensciencegrid.org https://topology-itb.opensciencegrid.org https://map.opensciencegrid.org : Generates the topology map used on OSG Display The source code for the service is in https://github.com/opensciencegrid/topology , in the src/ subdirectory. This repository also contains the public part of the data that gets served. Deployment Topology is a webapp run with Apache on the host topology.opensciencegrid.org . The ITB instance runs on the host topology-itb.opensciencegrid.org . The hosts are VMs at Nebraska; for SSH access, contact Derek Weitzel or Brian Bockelman. Installation These instructions assume an EL 7 host with the EPEL repositories available. The software will be installed into /opt/topology . A second instance for the webhook app will be installed into /opt/topology-webhook . (The ITB instance should be installed into /opt/topology-itb and /opt/topology-itb-webhook instead.) The following steps should be done as root. Install prerequisites: # yum install python36 gridsite httpd mod_ssl Clone the repository: For the production topology host: # git clone https://github.com/opensciencegrid/topology /opt/topology # git clone https://github.com/opensciencegrid/topology /opt/topology-webhook For the topology-itb host: # git clone https://github.com/opensciencegrid/topology /opt/topology-itb # git clone https://github.com/opensciencegrid/topology /opt/topology-itb-webhook Set up the virtualenv in the clone -- from /opt/topology or /opt/topology-itb : # python36 -m venv venv # . ./venv/bin/activate # pip install -r requirements-apache.txt Repeat for the webhook instance -- from /opt/topology-webhook or /opt/topology-itb-webhook . File system locations The following files/directories must exist and have the proper permissions: Location Purpose Ownership Mode /opt/topology Production software install root:root 0755 /opt/topology-itb ITB software install root:root 0755 /opt/topology-webhook Production webhook software install root:root 0755 /opt/topology-itb-webhook ITB webhook software install root:root 0755 /etc/opt/topology/config-production.py Production config root:root 0644 /etc/opt/topology/config-itb.py ITB config root:root 0644 /etc/opt/topology/bitbucket Private key for contact info repo apache:root 0600 /etc/opt/topology/bitbucket.pub Public key for contact info repo apache:root 0644 /etc/opt/topology/github Private key for pushing automerge commits topomerge:root 0600 /etc/opt/topology/github.pub Public key for pushing automerge commits topomerge:root 0644 /etc/opt/topology/github_webhook_secret GitHub webhook secret for validating webhooks topomerge:root 0600 ~apache/.ssh SSH dir for Apache apache:root 0700 ~apache/.ssh/known_hosts Known hosts file for Apache apache:root 0644 ~topomerge Home dir for topomerge Apache user topomerge:root 0755 ~topomerge/.ssh SSH dir for topomerge Apache user topomerge:root 0700 ~topomerge/.ssh/known_hosts Known hosts file for topomerge Apache user topomerge:root 0644 /var/cache/topology Checkouts of topology and contacts data for production instance apache:apache 0755 /var/cache/topology-itb Checkouts of topology and contacts data for ITB instance apache:apache 0755 /var/cache/topology-webhook Topology repo and state info for production webhook instance topomerge:topomerge 0755 /var/cache/topology-itb-webhook Topology repo and state info for ITB webhook instance topomerge:topomerge 0755 ~apache/.ssh/known_hosts must contain an entry for bitbucket.org ; use ssh-keyscan bitbucket.org to get the appropriate entry. ~topomerge/.ssh/known_hosts must contain an entry for github.com ; use ssh-keyscan github.com to get the appropriate entry. Software configuration Configuration for the main app is under /etc/opt/topology/ , in config-production.py and config-itb.py . The webhook app configuration is in config-production-webhook.py and config-itb-webhook.py . The files are in Python format and override default settings in src/webapp/default_config.py in the topology repo. HTTPD configuration is in /etc/httpd ; we use the modules mod_ssl , mod_gridsite , and mod_wsgi . The first two are installed via yum; the .so file for mod_wsgi is located in /opt/topology/venv/lib/python3.6/site-packages/mod_wsgi/server/ or /opt/topology-itb/venv/lib/python3.6/site-packages/mod_wsgi/server/ for the ITB instance. Each of the hostnames are VHosts in the apache configuration. Some special notes: https://map.opensciencegrid.org runs in the same wsgi process as the production topology, but the URL is limited to only the map code. Further, it does not use mod_gridsite so that users are not asked to present a client certificate. VHosts are configured: ServerName topology.opensciencegrid.org ServerAlias my.opensciencegrid.org myosg.opensciencegrid.org Data configuration Configuration is in /etc/opt/topology/config-production.py and config-itb.py ; and config-production-webhook.py and config-itb-webhook.py . Variable Purpose TOPOLOGY_DATA_DIR The directory containing a clone of the topology repository for data use TOPOLOGY_DATA_REPO The remote tracking repository of TOPOLOGY_DATA_DIR TOPOLOGY_DATA_BRANCH The remote tracking branch of TOPOLOGY_DATA_DIR WEBHOOK_DATA_DIR The directory containing a mirror-clone of the topology repository for webhook use WEBHOOK_DATA_REPO The remote tracking repository of WEBHOOK_DATA_DIR WEBHOOK_DATA_BRANCH The remote tracking branch of WEBHOOK_DATA_DIR WEBHOOK_STATE_DIR Directory containing webhook state information between pull request and status hooks WEBHOOK_SECRET_KEY Secret key configured on GitHub for webhook delivery CONTACT_DATA_DIR The directory containing a clone of the contact repository for data use CONTACT_DATA_REPO The remote tracking repository of CONTACT_DATA_DIR (default: \"git@bitbucket.org:opensciencegrid/contact.git\" ) CONTACT_DATA_BRANCH The remote tracking branch of CONTACT_DATA_BRANCH (default: \"master\" ) CACHE_LIFETIME Frequency of automatic data updates in seconds (default: 900 ) GIT_SSH_KEY Location of ssh public key file for git access. /etc/opt/topology/bitbucket.pub for the main app, and /etc/opt/topology/github.pub for the webhook app Puppet ensures that the production contact and topology clones are up to date with their configured remote tracking repo and branch. Puppet does not manage the ITB data directories so they need to be updated by hand during testing. GitHub Configuration for Webhook App Go to the https://github.com/opensciencegrid/topology/settings/hooks page on GitHub. There are four webhooks to set up; pull_request and status for both the topology and topology-itb hosts. Payload URL Content type Events to trigger webhook https://topology.opensciencegrid.org/webhook/status application/json Statuses https://topology.opensciencegrid.org/webhook/pull_request application/json Pull requests https://topology-itb.opensciencegrid.org/webhook/status application/json Statuses https://topology-itb.opensciencegrid.org/webhook/pull_request application/json Pull requests For each webhook, \"Secret\" should be a random 40 digit hex string, which should match the contents of the file /etc/opt/topology/github_webhook_secret (the path configured in WEBHOOK_SECRET_KEY ). The OSG's dedicated GitHub user for automating pushes is currently osg-bot . This user needs to have write access to the topology repo on GitHub. The ssh public key in /etc/opt/topology/github.pub should be registered with the osg-bot GitHub user. This can be done by logging into GitHub as osg-bot , and adding the new ssh key under the settings page. Required System Packages Currently the webhook app uses the mailx command to send email. If not already installed, install it with: :::console # yum install mailx Testing changes on the ITB instance All changes should be tested on the ITB instance before deploying to production. If you can, test them on your local machine first. These instructions assume that the code has not been merged to master. Update the ITB software installation at /opt/topology-itb and note the current branch: # cd /opt/topology-itb # git fetch --all # git status Check out the branch you are testing. If the target remote is not configured, add it : # git checkout -b / Verify that you are using the intended data associated with the code you are testing: If the data format has changed in an incompatible way, modify /etc/opt/topology/config-itb.py : Backup the ITB configuration file: # cd /etc/opt/topology # cp -p config-itb.py { ,.bak } Change the TOPOLOGY_DATA_DIR and/or CONTACT_DATA_DIR lines to point to a new directories so the previous data does not get overwritten with incompatible data. If you need to use a different branch for the data, switch to it: Check the branch of TOPOLOGY_DATA_DIR from /etc/opt/topology/config-itb.py # cd # git fetch --all # git status Note the previous branch, you will need this later If the target remote is not configured, add it Check out the target branch: # git checkout -b / Pull any upstream changes to ensure that your branch is up to date: # git pull For updates to the webhook app, follow the above instructions for the ITB webhook instance under /opt/topology-itb-webhook and its corresponding config file, /etc/opt/topology/config-itb-webhook.py . Restart httpd : # systemctl restart httpd Test the web interface at https://topology-itb.opensciencegrid.org . Errors and output are in /var/log/httpd/error_log . Reverting changes Switch /opt/topology-itb to the previous branch: # cd /opt/topology-itb # git checkout For updates to the webhook app, switch /opt/topology-itb-webhook to the previous master: # cd /opt/topology-itb-webhook # git checkout If you made config changes to /etc/opt/topology/config-itb.py or config-itb-webhook.py , restore the backup. If you checked out a different branch for data, revert it back to the old branch. Restart httpd : # systemctl restart httpd Test the web interface at https://topology-itb.opensciencegrid.org . Updating the production instance Updating the production instance is similar to updating ITB instance. Update master on the Git clone at /opt/topology : # cd /opt/topology # git pull origin master For updates to the webhook app, update master on the Git clone at /opt/topology-webhook : # cd /opt/topology-webhook # git pull origin master Make config changes to /etc/opt/topology/config-production.py and/or config-production-webhook.py if necessary. Restart httpd : # systemctl restart httpd Test the web interface at https://topology.opensciencegrid.org . Errors and output are in /var/log/httpd/error_log . Reverting changes Switch /opt/topology to the previous master: # cd /opt/topology # ## (use `git reflog` to find the previous commit that was used) # git reset --hard For updates to the webhook app, switch /opt/topology-webhook to the previous master: # cd /opt/topology-webhook ### (use `git reflog` to find the previous commit that was used) # git reset --hard If you made config changes to /etc/opt/topology/config-production.py or config-production-webhook.py , revert them. Restart httpd : # systemctl restart httpd Test the web interface at https://topology.opensciencegrid.org .","title":"Topology Service"},{"location":"services/topology/#topology-service","text":"This document contains information about the service that runs: https://topology.opensciencegrid.org https://topology-itb.opensciencegrid.org https://map.opensciencegrid.org : Generates the topology map used on OSG Display The source code for the service is in https://github.com/opensciencegrid/topology , in the src/ subdirectory. This repository also contains the public part of the data that gets served.","title":"Topology Service"},{"location":"services/topology/#deployment","text":"Topology is a webapp run with Apache on the host topology.opensciencegrid.org . The ITB instance runs on the host topology-itb.opensciencegrid.org . The hosts are VMs at Nebraska; for SSH access, contact Derek Weitzel or Brian Bockelman.","title":"Deployment"},{"location":"services/topology/#installation","text":"These instructions assume an EL 7 host with the EPEL repositories available. The software will be installed into /opt/topology . A second instance for the webhook app will be installed into /opt/topology-webhook . (The ITB instance should be installed into /opt/topology-itb and /opt/topology-itb-webhook instead.) The following steps should be done as root. Install prerequisites: # yum install python36 gridsite httpd mod_ssl Clone the repository: For the production topology host: # git clone https://github.com/opensciencegrid/topology /opt/topology # git clone https://github.com/opensciencegrid/topology /opt/topology-webhook For the topology-itb host: # git clone https://github.com/opensciencegrid/topology /opt/topology-itb # git clone https://github.com/opensciencegrid/topology /opt/topology-itb-webhook Set up the virtualenv in the clone -- from /opt/topology or /opt/topology-itb : # python36 -m venv venv # . ./venv/bin/activate # pip install -r requirements-apache.txt Repeat for the webhook instance -- from /opt/topology-webhook or /opt/topology-itb-webhook .","title":"Installation"},{"location":"services/topology/#file-system-locations","text":"The following files/directories must exist and have the proper permissions: Location Purpose Ownership Mode /opt/topology Production software install root:root 0755 /opt/topology-itb ITB software install root:root 0755 /opt/topology-webhook Production webhook software install root:root 0755 /opt/topology-itb-webhook ITB webhook software install root:root 0755 /etc/opt/topology/config-production.py Production config root:root 0644 /etc/opt/topology/config-itb.py ITB config root:root 0644 /etc/opt/topology/bitbucket Private key for contact info repo apache:root 0600 /etc/opt/topology/bitbucket.pub Public key for contact info repo apache:root 0644 /etc/opt/topology/github Private key for pushing automerge commits topomerge:root 0600 /etc/opt/topology/github.pub Public key for pushing automerge commits topomerge:root 0644 /etc/opt/topology/github_webhook_secret GitHub webhook secret for validating webhooks topomerge:root 0600 ~apache/.ssh SSH dir for Apache apache:root 0700 ~apache/.ssh/known_hosts Known hosts file for Apache apache:root 0644 ~topomerge Home dir for topomerge Apache user topomerge:root 0755 ~topomerge/.ssh SSH dir for topomerge Apache user topomerge:root 0700 ~topomerge/.ssh/known_hosts Known hosts file for topomerge Apache user topomerge:root 0644 /var/cache/topology Checkouts of topology and contacts data for production instance apache:apache 0755 /var/cache/topology-itb Checkouts of topology and contacts data for ITB instance apache:apache 0755 /var/cache/topology-webhook Topology repo and state info for production webhook instance topomerge:topomerge 0755 /var/cache/topology-itb-webhook Topology repo and state info for ITB webhook instance topomerge:topomerge 0755 ~apache/.ssh/known_hosts must contain an entry for bitbucket.org ; use ssh-keyscan bitbucket.org to get the appropriate entry. ~topomerge/.ssh/known_hosts must contain an entry for github.com ; use ssh-keyscan github.com to get the appropriate entry.","title":"File system locations"},{"location":"services/topology/#software-configuration","text":"Configuration for the main app is under /etc/opt/topology/ , in config-production.py and config-itb.py . The webhook app configuration is in config-production-webhook.py and config-itb-webhook.py . The files are in Python format and override default settings in src/webapp/default_config.py in the topology repo. HTTPD configuration is in /etc/httpd ; we use the modules mod_ssl , mod_gridsite , and mod_wsgi . The first two are installed via yum; the .so file for mod_wsgi is located in /opt/topology/venv/lib/python3.6/site-packages/mod_wsgi/server/ or /opt/topology-itb/venv/lib/python3.6/site-packages/mod_wsgi/server/ for the ITB instance. Each of the hostnames are VHosts in the apache configuration. Some special notes: https://map.opensciencegrid.org runs in the same wsgi process as the production topology, but the URL is limited to only the map code. Further, it does not use mod_gridsite so that users are not asked to present a client certificate. VHosts are configured: ServerName topology.opensciencegrid.org ServerAlias my.opensciencegrid.org myosg.opensciencegrid.org","title":"Software configuration"},{"location":"services/topology/#data-configuration","text":"Configuration is in /etc/opt/topology/config-production.py and config-itb.py ; and config-production-webhook.py and config-itb-webhook.py . Variable Purpose TOPOLOGY_DATA_DIR The directory containing a clone of the topology repository for data use TOPOLOGY_DATA_REPO The remote tracking repository of TOPOLOGY_DATA_DIR TOPOLOGY_DATA_BRANCH The remote tracking branch of TOPOLOGY_DATA_DIR WEBHOOK_DATA_DIR The directory containing a mirror-clone of the topology repository for webhook use WEBHOOK_DATA_REPO The remote tracking repository of WEBHOOK_DATA_DIR WEBHOOK_DATA_BRANCH The remote tracking branch of WEBHOOK_DATA_DIR WEBHOOK_STATE_DIR Directory containing webhook state information between pull request and status hooks WEBHOOK_SECRET_KEY Secret key configured on GitHub for webhook delivery CONTACT_DATA_DIR The directory containing a clone of the contact repository for data use CONTACT_DATA_REPO The remote tracking repository of CONTACT_DATA_DIR (default: \"git@bitbucket.org:opensciencegrid/contact.git\" ) CONTACT_DATA_BRANCH The remote tracking branch of CONTACT_DATA_BRANCH (default: \"master\" ) CACHE_LIFETIME Frequency of automatic data updates in seconds (default: 900 ) GIT_SSH_KEY Location of ssh public key file for git access. /etc/opt/topology/bitbucket.pub for the main app, and /etc/opt/topology/github.pub for the webhook app Puppet ensures that the production contact and topology clones are up to date with their configured remote tracking repo and branch. Puppet does not manage the ITB data directories so they need to be updated by hand during testing.","title":"Data configuration"},{"location":"services/topology/#github-configuration-for-webhook-app","text":"Go to the https://github.com/opensciencegrid/topology/settings/hooks page on GitHub. There are four webhooks to set up; pull_request and status for both the topology and topology-itb hosts. Payload URL Content type Events to trigger webhook https://topology.opensciencegrid.org/webhook/status application/json Statuses https://topology.opensciencegrid.org/webhook/pull_request application/json Pull requests https://topology-itb.opensciencegrid.org/webhook/status application/json Statuses https://topology-itb.opensciencegrid.org/webhook/pull_request application/json Pull requests For each webhook, \"Secret\" should be a random 40 digit hex string, which should match the contents of the file /etc/opt/topology/github_webhook_secret (the path configured in WEBHOOK_SECRET_KEY ). The OSG's dedicated GitHub user for automating pushes is currently osg-bot . This user needs to have write access to the topology repo on GitHub. The ssh public key in /etc/opt/topology/github.pub should be registered with the osg-bot GitHub user. This can be done by logging into GitHub as osg-bot , and adding the new ssh key under the settings page.","title":"GitHub Configuration for Webhook App"},{"location":"services/topology/#required-system-packages","text":"Currently the webhook app uses the mailx command to send email. If not already installed, install it with: :::console # yum install mailx","title":"Required System Packages"},{"location":"services/topology/#testing-changes-on-the-itb-instance","text":"All changes should be tested on the ITB instance before deploying to production. If you can, test them on your local machine first. These instructions assume that the code has not been merged to master. Update the ITB software installation at /opt/topology-itb and note the current branch: # cd /opt/topology-itb # git fetch --all # git status Check out the branch you are testing. If the target remote is not configured, add it : # git checkout -b / Verify that you are using the intended data associated with the code you are testing: If the data format has changed in an incompatible way, modify /etc/opt/topology/config-itb.py : Backup the ITB configuration file: # cd /etc/opt/topology # cp -p config-itb.py { ,.bak } Change the TOPOLOGY_DATA_DIR and/or CONTACT_DATA_DIR lines to point to a new directories so the previous data does not get overwritten with incompatible data. If you need to use a different branch for the data, switch to it: Check the branch of TOPOLOGY_DATA_DIR from /etc/opt/topology/config-itb.py # cd # git fetch --all # git status Note the previous branch, you will need this later If the target remote is not configured, add it Check out the target branch: # git checkout -b / Pull any upstream changes to ensure that your branch is up to date: # git pull For updates to the webhook app, follow the above instructions for the ITB webhook instance under /opt/topology-itb-webhook and its corresponding config file, /etc/opt/topology/config-itb-webhook.py . Restart httpd : # systemctl restart httpd Test the web interface at https://topology-itb.opensciencegrid.org . Errors and output are in /var/log/httpd/error_log .","title":"Testing changes on the ITB instance"},{"location":"services/topology/#reverting-changes","text":"Switch /opt/topology-itb to the previous branch: # cd /opt/topology-itb # git checkout For updates to the webhook app, switch /opt/topology-itb-webhook to the previous master: # cd /opt/topology-itb-webhook # git checkout If you made config changes to /etc/opt/topology/config-itb.py or config-itb-webhook.py , restore the backup. If you checked out a different branch for data, revert it back to the old branch. Restart httpd : # systemctl restart httpd Test the web interface at https://topology-itb.opensciencegrid.org .","title":"Reverting changes"},{"location":"services/topology/#updating-the-production-instance","text":"Updating the production instance is similar to updating ITB instance. Update master on the Git clone at /opt/topology : # cd /opt/topology # git pull origin master For updates to the webhook app, update master on the Git clone at /opt/topology-webhook : # cd /opt/topology-webhook # git pull origin master Make config changes to /etc/opt/topology/config-production.py and/or config-production-webhook.py if necessary. Restart httpd : # systemctl restart httpd Test the web interface at https://topology.opensciencegrid.org . Errors and output are in /var/log/httpd/error_log .","title":"Updating the production instance"},{"location":"services/topology/#reverting-changes_1","text":"Switch /opt/topology to the previous master: # cd /opt/topology # ## (use `git reflog` to find the previous commit that was used) # git reset --hard For updates to the webhook app, switch /opt/topology-webhook to the previous master: # cd /opt/topology-webhook ### (use `git reflog` to find the previous commit that was used) # git reset --hard If you made config changes to /etc/opt/topology/config-production.py or config-production-webhook.py , revert them. Restart httpd : # systemctl restart httpd Test the web interface at https://topology.opensciencegrid.org .","title":"Reverting changes"},{"location":"troubleshooting/repository-scripts/","text":"Troubleshooting Guide for Yum Repository Scripts The repo.opensciencegrid.org and repo-itb.opensciencegrid.org hosts contain the OSG Yum software repositories plus related services and tools. In particular, the mash software is used to download RPMs from where they are built (at the University of Wisconsin\u2013Madison), and there are some associated scripts to configure and invoke mash periodically. Use this guide to monitor the mash system for problems and to perform basic troubleshooting when such problems arise. Monitoring To monitor the repository hosts for proper mash operation, do the following steps on each host: ssh to repo.opensciencegrid.org and cd into /var/log/repo to view logs from mash updates Examine the \u201cLast modified\u201d timestamp of all of the update_repo.*.log files If the timestamps are all less than 2 hours old, life is good and you can skip the remaining steps below Otherwise, examine the \u201cLast modified\u201d timestamp of the update_all_repos.err file If the update_all_repos.err timestamp is current, there may be a mash process that is hung; see the Troubleshooting steps below If all timestamps are more than 6 hours old, something may be wrong with cron or its mash entries: Verify that cron is running and that the cron entries for mash are still present; if not, try to restore things Otherwise, create a Freshdesk ticket with a subject like \u201cRepo update logs are too old on \u201d and with relevant details in the body Assign the ticket to the \u201cSoftware\u201d group Troubleshooting and Mitigation Identifying and fixing a hung mash process If a mash update process hangs, all future invocations from cron of the mash scripts will exit without taking action because of the hung process. Thus, it is important to identify and remove any hung processes so that future updates can proceed. Use the procedure below to remove any hung mash processes; doing so is safe in that it will not adversely affect the Yum repositories being served from the host. In the listing of log files (see above), view the file =update_all_repos.err= In the error log file, look for messages such as: Wed Jan 20 18:10:02 UTC 2016: **Can't acquire lock, is update_all_repos.sh already running?** This message indicates that the most recent update attempt quit early due to the presence of a lock file, most likely from a hung mash process. Look for mash processes: root@host # ps -C mash -o pid,ppid,pgid,start,command PID PPID PGID STARTED COMMAND 24551 24549 23455 Jan 15 /usr/bin/python /usr/bin/mash osg-3.1-el5-release -o 24552 24551 23455 Jan 15 /usr/bin/python /usr/bin/mash osg-3.1-el5-release -o If there are mash processes that started on a previous date or more than 2 hours ago, it is best to remove their corresponding process groups (PGID above): root@host # kill -TERM -23455 Then verify that the old processes are gone using the same ps command as above: root@host # ps -C mash -o pid,ppid,pgid,start,command PID PPID PGID STARTED COMMAND If any part of this process does not look or work as expected: Create a Freshdesk ticket with a subject like \u201cRepo update logs are too old on \u201d and with relevant details in the body Assign the ticket to the \u201cSoftware\u201d group","title":"Troubleshooting Guide for Yum Repository Scripts"},{"location":"troubleshooting/repository-scripts/#troubleshooting-guide-for-yum-repository-scripts","text":"The repo.opensciencegrid.org and repo-itb.opensciencegrid.org hosts contain the OSG Yum software repositories plus related services and tools. In particular, the mash software is used to download RPMs from where they are built (at the University of Wisconsin\u2013Madison), and there are some associated scripts to configure and invoke mash periodically. Use this guide to monitor the mash system for problems and to perform basic troubleshooting when such problems arise.","title":"Troubleshooting Guide for Yum Repository Scripts"},{"location":"troubleshooting/repository-scripts/#monitoring","text":"To monitor the repository hosts for proper mash operation, do the following steps on each host: ssh to repo.opensciencegrid.org and cd into /var/log/repo to view logs from mash updates Examine the \u201cLast modified\u201d timestamp of all of the update_repo.*.log files If the timestamps are all less than 2 hours old, life is good and you can skip the remaining steps below Otherwise, examine the \u201cLast modified\u201d timestamp of the update_all_repos.err file If the update_all_repos.err timestamp is current, there may be a mash process that is hung; see the Troubleshooting steps below If all timestamps are more than 6 hours old, something may be wrong with cron or its mash entries: Verify that cron is running and that the cron entries for mash are still present; if not, try to restore things Otherwise, create a Freshdesk ticket with a subject like \u201cRepo update logs are too old on \u201d and with relevant details in the body Assign the ticket to the \u201cSoftware\u201d group","title":"Monitoring"},{"location":"troubleshooting/repository-scripts/#troubleshooting-and-mitigation","text":"","title":"Troubleshooting and Mitigation"},{"location":"troubleshooting/repository-scripts/#identifying-and-fixing-a-hung-mash-process","text":"If a mash update process hangs, all future invocations from cron of the mash scripts will exit without taking action because of the hung process. Thus, it is important to identify and remove any hung processes so that future updates can proceed. Use the procedure below to remove any hung mash processes; doing so is safe in that it will not adversely affect the Yum repositories being served from the host. In the listing of log files (see above), view the file =update_all_repos.err= In the error log file, look for messages such as: Wed Jan 20 18:10:02 UTC 2016: **Can't acquire lock, is update_all_repos.sh already running?** This message indicates that the most recent update attempt quit early due to the presence of a lock file, most likely from a hung mash process. Look for mash processes: root@host # ps -C mash -o pid,ppid,pgid,start,command PID PPID PGID STARTED COMMAND 24551 24549 23455 Jan 15 /usr/bin/python /usr/bin/mash osg-3.1-el5-release -o 24552 24551 23455 Jan 15 /usr/bin/python /usr/bin/mash osg-3.1-el5-release -o If there are mash processes that started on a previous date or more than 2 hours ago, it is best to remove their corresponding process groups (PGID above): root@host # kill -TERM -23455 Then verify that the old processes are gone using the same ps command as above: root@host # ps -C mash -o pid,ppid,pgid,start,command PID PPID PGID STARTED COMMAND If any part of this process does not look or work as expected: Create a Freshdesk ticket with a subject like \u201cRepo update logs are too old on \u201d and with relevant details in the body Assign the ticket to the \u201cSoftware\u201d group","title":"Identifying and fixing a hung mash process"}]}
\ No newline at end of file
diff --git a/sitemap.xml b/sitemap.xml
index c12524ed..fd639f02 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -1,119 +1,119 @@
https://osg-htc.org/operations/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/services/install-gwms-factory/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/services/topology/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/services/topology-contacts-data/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/services/finalize-cache-registration/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/services/sending-announcements/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/services/gracc-corrections/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/services/hosted-ce-definitions/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/services/ce-monitoring-dashboards/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/troubleshooting/repository-scripts/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/SLA/general/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/SLA/access-point/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/SLA/collector/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/SLA/gracc/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/SLA/gwms-factory/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/SLA/gwms-frontend/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/SLA/htcss-central-manager/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/SLA/hosted-ce/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/SLA/message-broker/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/SLA/oasis/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/SLA/osdf-core/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/SLA/osdf-cache/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/SLA/osdf-origin/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/SLA/perfsonar/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/SLA/software-repo/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/SLA/topology/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/SLA/web-pages/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/SLA/xdlogin/
- 2024-02-16
+ 2024-02-23
daily
https://osg-htc.org/operations/external-oasis-repos/
- 2024-02-16
+ 2024-02-23
daily
\ No newline at end of file
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index 735da8dc22013902a62acd6bc5b4421aa10b3de0..478c4901681f9991986476b32d5f94cb24f8e06d 100644
GIT binary patch
delta 480
zcmV<60U!SI1Mvd~ABzYG&&$}62OWRL-~g!WD=wV40Y4y1+%$UAxWwMujbD#b#>bTd
zl6$G**wQ0^r+>Q5dH({H5|(rK>TvA#P&|fI;<@|!<)eAqJq^45S~CIfMY+(qo0ymP
z)~@S%pu{k{g$`JwB|pi5>_cdehj-o3Y=_;xeSJnhG-DELq-ZAgn;|Yx9SeT~+1@s#
zU;2)C#xR_Yr#I$!GN;#a{g6FW{Nzf6vBI|@xRmd`{Y$dx^tKWFBL+czNgl|iLhN!j
zah=JG&NJ4#I!ML^Ge*5$#b|9+}|SslyyoaXEbl$dmeWEwPpg|i*liJH!&~o
ztXL3{x%oz288IOgYL*rujxv{uUv;?)9VJyQ9w||bZSMemP;IwV
zM