Skip to content

Releases: fgci-org/fgci-ansible

v1.7.0 Ornamental Orm

03 Jan 08:29
Compare
Choose a tag to compare

General Updates

As of this release we now require ansible 2.2.
Using ansible 2.1 is no longer supported and will cause failures in some tasks & roles.

'slurm' tag added to ansible-role-pam in compute.yml, local.yml and login.yml

New Roles

  • nhc-scripts: added to compute and local.yml 9. This role can be used to add more Node Health Checker (NHC) scripts to compute nodes. RPMs might be a better choice.

Role Updates

  • lldpd: also skip on virtualization_host == "NA" 10
  • nfs: increase nfsd threads to 32 by default 8
  • pxe_bootstrap:
    • More informative error message from the boot.py script 11
    • Make fetching upgrade.img optional 12
  • slurm:
    • ansible 2.2 4
    • topology.conf generation now works again 7
    • improved testing
  • smartd: no longer fails if no disks are found with "smartctl --scan" 1

Ansible 2.2 updates to use check_mode instead of always_run

Two roles with references to the commits: flowdock: 2 and ip_forwarder: 3
These were also updated similarly: cuda, cvmfs,fail2ban,nfs_mount,nhc,pxe_config,rdma,serial-console,smartd,squid 6

v1.6.8

21 Nov 10:07
Compare
Choose a tag to compare

General Updates

  • Example File Updates:
    • The daily yum-cron cronjob autoupdates everything and not only the hourly one ( #161 )
    • nfs_exports default updated so that root on compute nodes isn't root on NFS /scratch (it's user nobody). Only have it on grid and install node where it's needed. 6
    • Neither of these are mandatory changes.

New Roles

  • nhc-scripts: a role to copy or get_url in custom nhc checks. Not in use in any playbooks (yet) 1
  • lldpd: a role to configure lldpd on server - so one can see their hostnames on ports in switches . Not in use (yet) 8

Role Updates

  • arc-client: new default client.conf points to only Finnish GIIS. ~10x speed up of "arcinfo" 2
  • fgci-install: on host_vars wget mirror in ansible-pull-script.sh we now only accept "yml" files 4
  • flowdock: also tag flowdock notifications wit ansible_version number 7
  • nhc: we now use nvidia-smi -L 5
  • slurm: configure mysql root user. New variable {{ slurm_manage_mysql_security }} - this defaults to True. So if you are already configuring the mysql root user on the dbd host you can disable it 3

v1.6.7

03 Nov 12:13
Compare
Choose a tag to compare

General Updates

Our default examples/group_vars/nfs/ variables were to make the NFS network interfaces DHCP.
This has been changed in favor of using static IP configuration - reduce issues during whole cluster reboots (#164)

Role Updates

  • arc_client: use ini_file instead of lininefile 2 and create an arc client.conf with sensible defaults 3
  • lustre_client: fix ansible undefined variables errors 1
  • rdma: call lspci with a full path and make it disabled by default (default for FGCI is to now install it only where infiniband_available is True) 10
  • users: big updates - please remove the "group: csc" from each csc admin user in group_vars/all/root_keys.yml 4
    • support comment and remove argument 5
    • use group and groups arguments properly: 6
    • actually remove users and ssh keys: 7
    • use block: to reduce number of tasks: 8
  • slurm: optionally configure plugstack and x11-spank plugin. See commit fgci-org/ansible-role-slurm@931d926 for the exact changes to the role.
    • testing/travis with SLURM 16.05. Role default is still 15.08.
slurm_plugstack: False
slurm_x11_spank: False
slurm_x11_setting: "optional          x11.so"
  • arc-frontend: Allows the usage of a non default runtime environment location for ARC-CE 9
    • Defaults fit FGCI common settings. Unless you're using a non-default location there's no need to overwrite the variables.
    • If you plan to use a non-default location please overwrite the default variables in your group_vars/grid/grid.yml. See this roles' defaults for more details.
    • Cronjob uses rsync to keep runtime folder up-to-date with runtime folder in CVMFS
  • nfs_mount: select the right variables - set remove fstab files to true for test 11
    • Labels old method as deprecated
    • deletes old generated back files by setting clean_fstab_backups as True

v1.6.6

26 Oct 13:03
Compare
Choose a tag to compare

General Updates

  • The ansible-pull cronjob on compute nodes now runs with /bin/nice to reduce interference 5

Role Updates

  • cuda: add a bash profile script with cuda settings, general role improvements 4
  • rsyslog:
    • increase maxopenfiles - needed for clusters with many clients.
    • allow logging by function - if you don't want the logs to end up in %hostname%.log - it is possible. See 1 for more details.
  • slurm:
    • improve rsyslog configuration. More details: 2
    • add a pause when upgrading slurm versions on the slurm accounting host 3
  • smartd: 6 make it work with ansible 2.2

v1.6.5

10 Oct 07:33
Compare
Choose a tag to compare

General Updates

Please remove the group: "csc" from the users in group_vars/all/root_keys.yml. Why? See #159

  - { name: cscluis, uid: 5001, group: csc, state: "present", groups: "{{admingroup}}", shell: "{{adminshell}}", pubkey: "ssh-rsa  KEY" }

should become:

  - { name: cscluis, uid: 5001, state: "present", groups: "{{admingroup}}", shell: "{{adminshell}}", pubkey: "ssh-rsa  KEY" }

CVMFS had a major rearrangement

On FGCI machines (This release will apply the necessary changes for you, however, if you have overwritten the corresponding ansible variable bash_modules_path, please make sure you update it to the new MODULEPATH string):

  • MODULEPATH should point to /cvmfs/fgi.csc.fi/modules/el7/all
  • MODULEPATH should NOT point to /cvmfs/fgi.csc.fi/modules/ or /cvmfs/fgi.csc.fi/modules/all

On FGI machines (A new FGI package will be produced soon with these changes):

  • MODULEPATH should point to /cvmfs/fgi.csc.fi/modules/sl6
  • MODULEPATH should NOT point to /cvmfs/fgi.csc.fi/modules/fgci or /cvmfs/fgi.csc.fi/modules/all

Meanwhile (just for FGI) you can change the following line manually in /etc/profile.d/modules-cvmfs.sh to:

export MODULEPATH=$MODULEPATH:/cvmfs/fgi.csc.fi/modules/sl6/

And in /etc/profile.d/modules-cvmfs.csh to:

setenv MODULEPATH "${MODULEPATH}:/cvmfs/fgi.csc.fi/modules/sl6/"

Some minor fixes may be needed in the near future. Please test your software and report any issue to [email protected]

Role Updates

  • cuda: install gpg keys for the yum repo 1
  • yum: disable yum update of * to latest in yum-role #160
  • fgci-bash: update MODULEPATH 2

v1.6.4

03 Oct 06:58
Compare
Choose a tag to compare

General Updates

We now use chronyd instead of ntpd on the compute nodes in ansible-pull.

Role Updates

  • cuda: CUDA8 was released and they use a new key to sign rpms. 1
  • network: Use a different network scheduler on some nodes #158
  • nhc: make rebooting a node work better on nvidia-smi errors, and also make it reboot gpu nodes on driver/module mismatches 2
  • slurm: disable slurm init script on EL7 and generate a topology.conf 3

v1.6.3

27 Sep 07:26
Compare
Choose a tag to compare

General Updates

  • In a near future release we will stop using ntpd and change to chrony. ( Issue #103 )
    • Same procedure as usual. Starting with this release we add the new chrony role to master in the requirements.yml and to the ansible push mode playbooks. A week later we add it to the local.yml for ansible-pull
    • There are a few new variables introduced by this role. We have added new defaults to the examples/group_vars/ that you can add to your local group_vars. The defaults (without the changes from our examples/group_vars/) make every chronyd an NTP server and allow connections from 10.0.0.0/8.
  • We now apply the ansible-role-fgci-install to the compute.yml and local.yml playbooks - for the ansible-pull logrotate

Role Updates

  • fgci-install:
    • the ansible-pull-script can now log to a separate file which can also be rotated. How to configure it is in fgci-ansible/examples/group_vars/all/ - by default it only stores the output of the last ansible-pull run. 2
    • the compute.yml task file (for slurm_compute nodes) no longer templates in the ansible-pull-script.sh
  • nfs_mount role has for a while supported using the mount module rather than editing /etc/fstab when setting up mounts. The defaults still use the old style. One can change to using the new one by removing the nfs_mount variable and adding nfs_mounts. Examples are in fgci-ansible/examples/group_vars/all ( Issue #152 )
  • pxe_config: by default we now use the new method for choosing which kickstart configs to create 1
  • arc-frontend: Sets Slurm fairshare for FGCI (80% Local/ 20% Grid) 3
    • can be disabled by setting arc_frontend_enable_fairshare = False

v1.6.2

19 Sep 06:24
Compare
Choose a tag to compare

General Updates

  • fgci-ansible playbooks are now under MIT license. #155
  • Added hosts-int to local.yml too #153 - update your ansibles.
  • Added configuring the ansible-pull cronjob to compute.yml #156 - it is now possible to fix a broken /etc/cron.d/ansible-pull cronjob entry! Run this:
    ansible-playbook compute.yml -t pull
    

Role Updates

  • fgci-install: 1 Gathering of $runtime in ansible-pull-script.sh now uses /proc/timer_list instead of date command
    • When we send $runtime to grafana we now do it in a separate metric [2](so two API/CURL POST requests instead of one) . Also made grafana_ansible_pull.sh smaller. Idea is to be able to make graphs showing where ansible-pull is slower, to find places where we can do performance improvements.
  • pxe_config: The custom-inventory.py is now much more easy to read (from 184 lines to 87). (PR, @jabl, 3)
  • arc-frontend using standard GOCDB site name on arc.conf for InfoSys and Vapor compliance 4

v1.6.1

12 Sep 06:38
Compare
Choose a tag to compare

General Updates

local.yml (for ansible-pull) now has the systemd-journal role added.

If the git-mirror on the install node is not mirroring the systemd-journal role (added on Tuesday 6/9/2016) ansible-pull will stop working until it is. One way of adding it is with a git pull and tools/pullReqs.sh in the fgci-ansible repo.

New Roles

  • hosts-int: populate /etc/hosts with all the hosts from the ansible inventory. Not in ansible-pull yet. #151

Role Updates

  • nfs_mount: optionally use the ansible mount module instead of lineinfile (which remains the default) 1

v1.6.0 - Nifty Night Snake

06 Sep 06:58
Compare
Choose a tag to compare

General

The update in #146 means that "pdsh -a" no longer only talks to the compute nodes but to all the nodes in the ansible inventory. Use "pdsh -g compute" to only talk to the compute nodes.

New Roles

  • systemd-journal: new role which enables persistent journals stored in /var/log/journal . Because the "systemctl restart systemd-journald" which is needed to have a server write to the permanent log location sometimes fails this change will only be applied on next server reboot or next systemd-journald service restart. 11

Please run bash tools/pullReqs.sh. Or at least update the requirements.yml (and the git mirrors) on the install node. We have not updated local.yml for ansible-pull with the new role systemd-journal, so ansible-pull should keep working. The plan is to make a minor release on Monday the 12th of September to update local.yml.

Role Updates

  • arc-client: exclude ca_* packages from nordugrid repo. This role installs the EGI trustanchor repo.
  • arc-frontend:
    • Address infosys misconfiguration: 8
  • collectd: use sha1 checksums when fetching two external scripts (ipmitool.sh and dcmi.sh) 1
  • cuda:
    • remove cuda_init.sh from rc.local and run it as a systemd.service instead 2
    • skip a wait_for_handler on ansible-pull: 10
  • fgci-install:
    • ansible-pull-script:
    • skip skipped hosts 15
    • create /etc/FGCI to indicate that ansible-pull has at least once succeeded 14
    • send $runtime of the ansible-pull command to grafana 13
    • fgci-install role update: don't rsync temporary vim files 12
  • fgci-login: use variables to define which IP addresses we change between for the default gateway we send to the dhcp clients 9. The gateway is changed because the install node is setup before the login node. So in the meantime (when there is no login node) - we use the install node as the gateway.
  • flowdock:
    • changed tag to "always" on all playbooks except in compute.yml - meaning the flowdock role will always run (and send FGCI admins a notification that the playbook was run).
    • more tags (git_head, vendor and dist)
  • pdsh: change to use /etc/genders instead of /etc/machines #146.
  • pxe_config:
    • don't run the custom-inventory with sudo 4
    • populate an extra group into the hosts file / internal DNS. Examples in examples/hosts-example 5
    • don't sleep while running ansible-pull-script via rc.local 7
  • slurm:
    • delegate starting munged from service.yml to the slurm accounting host 3.
    • use vars/ directory for more EL6/EL7 differences
    • removed some debug tasks
    • more thorough testing of this role is enabled.