Releases: fgci-org/fgci-ansible
v1.7.0 Ornamental Orm
General Updates
As of this release we now require ansible 2.2.
Using ansible 2.1 is no longer supported and will cause failures in some tasks & roles.
'slurm' tag added to ansible-role-pam in compute.yml, local.yml and login.yml
New Roles
- nhc-scripts: added to compute and local.yml 9. This role can be used to add more Node Health Checker (NHC) scripts to compute nodes. RPMs might be a better choice.
Role Updates
- lldpd: also skip on virtualization_host == "NA" 10
- nfs: increase nfsd threads to 32 by default 8
- pxe_bootstrap:
- slurm:
- smartd: no longer fails if no disks are found with "smartctl --scan" 1
Ansible 2.2 updates to use check_mode instead of always_run
Two roles with references to the commits: flowdock: 2 and ip_forwarder: 3
These were also updated similarly: cuda, cvmfs,fail2ban,nfs_mount,nhc,pxe_config,rdma,serial-console,smartd,squid 6
v1.6.8
General Updates
- Example File Updates:
New Roles
- nhc-scripts: a role to copy or get_url in custom nhc checks. Not in use in any playbooks (yet) 1
- lldpd: a role to configure lldpd on server - so one can see their hostnames on ports in switches . Not in use (yet) 8
Role Updates
- arc-client: new default client.conf points to only Finnish GIIS. ~10x speed up of "arcinfo" 2
- fgci-install: on host_vars wget mirror in ansible-pull-script.sh we now only accept "yml" files 4
- flowdock: also tag flowdock notifications wit ansible_version number 7
- nhc: we now use nvidia-smi -L 5
- slurm: configure mysql root user. New variable {{ slurm_manage_mysql_security }} - this defaults to True. So if you are already configuring the mysql root user on the dbd host you can disable it 3
v1.6.7
General Updates
Our default examples/group_vars/nfs/ variables were to make the NFS network interfaces DHCP.
This has been changed in favor of using static IP configuration - reduce issues during whole cluster reboots (#164)
Role Updates
- arc_client: use ini_file instead of lininefile 2 and create an arc client.conf with sensible defaults 3
- lustre_client: fix ansible undefined variables errors 1
- rdma: call lspci with a full path and make it disabled by default (default for FGCI is to now install it only where infiniband_available is True) 10
- users: big updates - please remove the "group: csc" from each csc admin user in group_vars/all/root_keys.yml 4
- slurm: optionally configure plugstack and x11-spank plugin. See commit fgci-org/ansible-role-slurm@931d926 for the exact changes to the role.
- testing/travis with SLURM 16.05. Role default is still 15.08.
slurm_plugstack: False slurm_x11_spank: False slurm_x11_setting: "optional x11.so"
- arc-frontend: Allows the usage of a non default runtime environment location for ARC-CE 9
- Defaults fit FGCI common settings. Unless you're using a non-default location there's no need to overwrite the variables.
- If you plan to use a non-default location please overwrite the default variables in your group_vars/grid/grid.yml. See this roles' defaults for more details.
- Cronjob uses rsync to keep runtime folder up-to-date with runtime folder in CVMFS
- nfs_mount: select the right variables - set remove fstab files to true for test 11
- Labels old method as deprecated
- deletes old generated back files by setting clean_fstab_backups as True
v1.6.6
General Updates
- The ansible-pull cronjob on compute nodes now runs with /bin/nice to reduce interference 5
Role Updates
- cuda: add a bash profile script with cuda settings, general role improvements 4
- rsyslog:
- increase maxopenfiles - needed for clusters with many clients.
- allow logging by function - if you don't want the logs to end up in %hostname%.log - it is possible. See 1 for more details.
- slurm:
- smartd: 6 make it work with ansible 2.2
v1.6.5
General Updates
Please remove the group: "csc" from the users in group_vars/all/root_keys.yml. Why? See #159
- { name: cscluis, uid: 5001, group: csc, state: "present", groups: "{{admingroup}}", shell: "{{adminshell}}", pubkey: "ssh-rsa KEY" }
should become:
- { name: cscluis, uid: 5001, state: "present", groups: "{{admingroup}}", shell: "{{adminshell}}", pubkey: "ssh-rsa KEY" }
CVMFS had a major rearrangement
On FGCI machines (This release will apply the necessary changes for you, however, if you have overwritten the corresponding ansible variable bash_modules_path, please make sure you update it to the new MODULEPATH string):
- MODULEPATH should point to /cvmfs/fgi.csc.fi/modules/el7/all
- MODULEPATH should NOT point to /cvmfs/fgi.csc.fi/modules/ or /cvmfs/fgi.csc.fi/modules/all
On FGI machines (A new FGI package will be produced soon with these changes):
- MODULEPATH should point to /cvmfs/fgi.csc.fi/modules/sl6
- MODULEPATH should NOT point to /cvmfs/fgi.csc.fi/modules/fgci or /cvmfs/fgi.csc.fi/modules/all
Meanwhile (just for FGI) you can change the following line manually in /etc/profile.d/modules-cvmfs.sh to:
export MODULEPATH=$MODULEPATH:/cvmfs/fgi.csc.fi/modules/sl6/
And in /etc/profile.d/modules-cvmfs.csh to:
setenv MODULEPATH "${MODULEPATH}:/cvmfs/fgi.csc.fi/modules/sl6/"
Some minor fixes may be needed in the near future. Please test your software and report any issue to [email protected]
Role Updates
v1.6.4
General Updates
We now use chronyd instead of ntpd on the compute nodes in ansible-pull.
Role Updates
- cuda: CUDA8 was released and they use a new key to sign rpms. 1
- network: Use a different network scheduler on some nodes #158
- nhc: make rebooting a node work better on nvidia-smi errors, and also make it reboot gpu nodes on driver/module mismatches 2
- slurm: disable slurm init script on EL7 and generate a topology.conf 3
v1.6.3
General Updates
- In a near future release we will stop using ntpd and change to chrony. ( Issue #103 )
- Same procedure as usual. Starting with this release we add the new chrony role to master in the requirements.yml and to the ansible push mode playbooks. A week later we add it to the local.yml for ansible-pull
- There are a few new variables introduced by this role. We have added new defaults to the examples/group_vars/ that you can add to your local group_vars. The defaults (without the changes from our examples/group_vars/) make every chronyd an NTP server and allow connections from 10.0.0.0/8.
- We now apply the ansible-role-fgci-install to the compute.yml and local.yml playbooks - for the ansible-pull logrotate
Role Updates
- fgci-install:
- the ansible-pull-script can now log to a separate file which can also be rotated. How to configure it is in fgci-ansible/examples/group_vars/all/ - by default it only stores the output of the last ansible-pull run. 2
- the compute.yml task file (for slurm_compute nodes) no longer templates in the ansible-pull-script.sh
- nfs_mount role has for a while supported using the mount module rather than editing /etc/fstab when setting up mounts. The defaults still use the old style. One can change to using the new one by removing the nfs_mount variable and adding nfs_mounts. Examples are in fgci-ansible/examples/group_vars/all ( Issue #152 )
- pxe_config: by default we now use the new method for choosing which kickstart configs to create 1
- arc-frontend: Sets Slurm fairshare for FGCI (80% Local/ 20% Grid) 3
- can be disabled by setting arc_frontend_enable_fairshare = False
v1.6.2
General Updates
- fgci-ansible playbooks are now under MIT license. #155
- Added hosts-int to local.yml too #153 - update your ansibles.
- Added configuring the ansible-pull cronjob to compute.yml #156 - it is now possible to fix a broken /etc/cron.d/ansible-pull cronjob entry! Run this:
ansible-playbook compute.yml -t pull
Role Updates
- fgci-install: 1 Gathering of $runtime in ansible-pull-script.sh now uses /proc/timer_list instead of date command
- When we send $runtime to grafana we now do it in a separate metric [2](so two API/CURL POST requests instead of one) . Also made grafana_ansible_pull.sh smaller. Idea is to be able to make graphs showing where ansible-pull is slower, to find places where we can do performance improvements.
- pxe_config: The custom-inventory.py is now much more easy to read (from 184 lines to 87). (PR, @jabl, 3)
- arc-frontend using standard GOCDB site name on arc.conf for InfoSys and Vapor compliance 4
v1.6.1
General Updates
local.yml (for ansible-pull) now has the systemd-journal role added.
If the git-mirror on the install node is not mirroring the systemd-journal role (added on Tuesday 6/9/2016) ansible-pull will stop working until it is. One way of adding it is with a git pull and tools/pullReqs.sh in the fgci-ansible repo.
New Roles
- hosts-int: populate /etc/hosts with all the hosts from the ansible inventory. Not in ansible-pull yet. #151
Role Updates
- nfs_mount: optionally use the ansible mount module instead of lineinfile (which remains the default) 1
v1.6.0 - Nifty Night Snake
General
The update in #146 means that "pdsh -a" no longer only talks to the compute nodes but to all the nodes in the ansible inventory. Use "pdsh -g compute" to only talk to the compute nodes.
New Roles
- systemd-journal: new role which enables persistent journals stored in /var/log/journal . Because the "systemctl restart systemd-journald" which is needed to have a server write to the permanent log location sometimes fails this change will only be applied on next server reboot or next systemd-journald service restart. 11
Please run bash tools/pullReqs.sh. Or at least update the requirements.yml (and the git mirrors) on the install node. We have not updated local.yml for ansible-pull with the new role systemd-journal, so ansible-pull should keep working. The plan is to make a minor release on Monday the 12th of September to update local.yml.
Role Updates
- arc-client: exclude ca_* packages from nordugrid repo. This role installs the EGI trustanchor repo.
- arc-frontend:
- Address infosys misconfiguration: 8
- collectd: use sha1 checksums when fetching two external scripts (ipmitool.sh and dcmi.sh) 1
- cuda:
- fgci-install:
- fgci-login: use variables to define which IP addresses we change between for the default gateway we send to the dhcp clients 9. The gateway is changed because the install node is setup before the login node. So in the meantime (when there is no login node) - we use the install node as the gateway.
- flowdock:
- changed tag to "always" on all playbooks except in compute.yml - meaning the flowdock role will always run (and send FGCI admins a notification that the playbook was run).
- more tags (git_head, vendor and dist)
- pdsh: change to use /etc/genders instead of /etc/machines #146.
- pxe_config:
- slurm:
- delegate starting munged from service.yml to the slurm accounting host 3.
- use vars/ directory for more EL6/EL7 differences
- removed some debug tasks
- more thorough testing of this role is enabled.