Skip to content

Commit

Permalink
Initial Release - copy from internal repo.
Browse files Browse the repository at this point in the history
  • Loading branch information
martbhell committed Sep 21, 2015
1 parent f9865e3 commit e43a595
Show file tree
Hide file tree
Showing 21 changed files with 863 additions and 0 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# ignore the file that should contain the mysql_slurm_password
group_vars/*/mysql

# Backup files
*~
*.swp
93 changes: 93 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Creates a slurm cluster in pouta

Tested with slurm versions:
- 14.11.0
- 14.11.3
- 15.08.0

## How-To

### Launch the Openstack instances:

- ansible-playbook launcopenstackinstance.yml # launches the VMs. source the openstack-rc script before running this playbook. Also update the playbook to include the names of your key and tenant.
- see the group_vars/all/all file for default variables used for launching an OS instance

### Initial configuration of the instances and your workstation:

- First yum -y install nc on the bastion host.
- Then setup SSH config so you don't have to have a public IP on each instance. Change the Hostname in "Host bastion" to the service node.

Put this in ~/.ssh/config :

<pre>
# http://edgeofsanity.net/article/2012/10/15/ssh-leap-frog.html
# This applies to all hosts in your ssh config.
ControlMaster auto
ControlPath ~/.ssh/ssh_control_%h_%p_%r

# Always ssh as cloud-user and don't save hostkeys
Host bastion
User cloud-user
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
Hostname 86.50.168.39

Host slurm*
User cloud-user
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
ForwardAgent yes
ProxyCommand ssh bastion nc %h %p
</pre>

- Second update the /etc/hosts on the service node (playbook should update the others):

</pre>
# For second cluster
192.168.36.126 slurm2-compute1
192.168.36.125 slurm2-service
192.168.36.124 slurm2-login

# For first cluster
192.168.36.129 slurm-compute1
192.168.36.128 slurm-service
192.168.36.127 slurm-login
</pre>

### Then we can finally run the slurm configuration playbooks:

\_ Update the files in group_vars/ to your settings

You also need to add a mysql_slurm_password: "PASSWORD" string somewhere. This will be used to set a password for the slurm mysql user.


#### Description of the playbooks:

- site.yml - calls the slurm*.yml playbooks
- slurm_*.yml # The playbooks that configure the servers
- set slurm_version - this is used to determine which version to download from schedmd.com

#### Run them in this order:

- Update stage to have the right IP addresses and hostnames.
- configuring 1st slurm: ansible-playbook site.yml

- Update stage to have the right IP addresses and hostnames.
- configuring 2st slurm: ansible-playbook site.yml

#### Add cloud-user to slurm

<pre>
sacctmgr create account name=csc
sacctmgr create user name=cloud-user account=csc
</pre>

### Make changes to slurm.conf and distribute it to nodes and restart/reconfigure:

- Would be nice with a role /tag where one could just run ansible-playbook site.yml --tag new-slur-config and it pushes new config and restarts/reconfigs as necessary.

# Authors:

- Marco Passerini (original author)
- Johan Guldmyr (updates done as part of FGCI work)

9 changes: 9 additions & 0 deletions group_vars/all/all
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---

tenant_name: "tenant_name"
key_name: "key_name"
image: "CentOS-7.0"
flavor: "small"
security_groups: "default,slurm"
network_name: "default"
auth_url: "https://pouta.csc.fi:5000/v2.0/"
27 changes: 27 additions & 0 deletions launcopenstackinstance.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
- name: launch an instance
hosts: localhost
gather_facts: False

tasks:
- name: launch the Slurm servers
nova_compute:
state: present
login_username: "{{ lookup('env','OS_USERNAME') }}"
login_password: "{{ lookup('env','OS_PASSWORD') }}"
login_tenant_name: "{{ tenant_name }}"
auth_url: "{{ auth_url }}"
name: "{{ item }}"
image: "{{ image }}"
key_name: "{{ key_name }}"
wait_for: 200
flavor: "{{ flavor }}"
security_groups: "{{ security_groups }}"
nics:
- net-id: "{{ network_name }}"
meta:
hostname: slurms
with_items:
- slurm2-login
- slurm2-service
- slurm2-compute1
15 changes: 15 additions & 0 deletions roles/common/files/iptables
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Firewall configuration written by system-config-firewall
# Manual customization of this file is not recommended.
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -i eth0 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT

65 changes: 65 additions & 0 deletions roles/common/tasks/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
- name: update all packages first
yum: pkg=* state=latest

- name: Remove duplicate CentOS repos
file: name=/etc/yum.repos.d/centos6-latest.repo state=absent

- name: install EPEL6
yum: name='http://www.nic.funet.fi/pub/mirrors/fedora.redhat.com/pub/epel/6/i386/epel-release-6-8.noarch.rpm' state=present
when: major_relase is "6"

- name: install software
yum: name="{{item}}" state=present
with_items:
- "xterm"
- "sssd"
- "gcc"
- "make"
- "gcc-c++"
- "wget"
- "vim"
- "man"
- "rpm-build"
- "pam"
- "pam-devel"
- "hwloc"
- "munge"
- "munge-devel"
- "munge-libs"
- "readline-devel"
- "openssl-devel"
- "perl-ExtUtils-MakeMaker"
- "lua"
- "lua-devel"
- "lua-posix"
- "lua-filesystem"

- debug: msg="{{groups['all']}}"
tags: debug



# - name: create /etc/hosts
# lineinfile: dest=/etc/hosts regexp='.*{{ item }}$' line="{{ hostvars[item]['ansible_default_ipv4']['address'] }} {{item}}" state=present
# when: hostvars[item]['ansible_default_ipv4']['address'] is defined
# with_items: groups['all']
# tags: debug

- name: Add cluster hosts to local /etc/hosts
sudo: yes
action: lineinfile
state=present
dest=/etc/hosts
line="{{ hostvars[item]['ssh_host'] }} {{ item }}"
when: hostvars[item]['ssh_host'] is defined
with_items: groups.all
tags: debug

- name: copy iptables settings
copy: src=iptables
dest=/etc/sysconfig/iptables owner=root mode=600


- name: restart iptables
service: name=iptables state=restarted
Empty file added roles/common/vars/main.yml
Empty file.
3 changes: 3 additions & 0 deletions roles/slurm_common/files/slurm
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
auth required pam_localuser.so
account required pam_unix.so
session required pam_limits.so
91 changes: 91 additions & 0 deletions roles/slurm_common/tasks/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
---

# - name: Copy slurm files
# synchronize: mode=pull src={{ item }} dest=/root/rpmbuild/RPMS/x86_64/
# delegate_to: slurm-service
# with_items:
# - "/root/rpmbuild/RPMS/x86_64/slurm-plugins-{{ slurm_version }}-1.el6.x86_64.rpm"
# - "/root/rpmbuild/RPMS/x86_64/slurm-{{ slurm_version }}-1.el6.x86_64.rpm"
# - "/root/rpmbuild/RPMS/x86_64/slurm-perlapi-{{ slurm_version }}-1.el6.x86_64.rpm"
# - "/root/rpmbuild/RPMS/x86_64/slurm-slurmdb-direct-{{ slurm_version }}-1.el6.x86_64.rpm"
# - "/root/rpmbuild/RPMS/x86_64/slurm-sql-{{ slurm_version }}-1.el6.x86_64.rpm"
#- "roles/slurm_common/files/slurm-lua-{{ slurm_version }}-1.el6.x86_64.rpm"
# - "/root/rpmbuild/RPMS/x86_64/slurm-pam_slurm-{{ slurm_version }}-1.el6.x86_64.rpm"
# - "/root/rpmbuild/RPMS/x86_64/slurm-sjstat-{{ slurm_version }}-1.el6.x86_64.rpm"
# - "/root/rpmbuild/RPMS/x86_64/slurm-slurmdbd-{{ slurm_version }}-1.el6.x86_64.rpm"
#- "roles/slurm_common/files/slurm-spank-x11-{{ slurm_version }}-1.el6.x86_64.rpm"
# - "/root/rpmbuild/RPMS/x86_64/slurm-torque-{{ slurm_version }}-1.el6.x86_64.rpm"
# - "/root/rpmbuild/RPMS/x86_64/slurm-devel-{{ slurm_version }}-1.el6.x86_64.rpm"
# - "/root/rpmbuild/RPMS/x86_64/slurm-munge-{{ slurm_version }}-1.el6.x86_64.rpm"
# - "/root/rpmbuild/RPMS/x86_64/slurm-sjobexit-{{ slurm_version }}-1.el6.x86_64.rpm"



- name: distribute the slurm RPMs to the nodes
copy: src={{ item }}
dest=/root/rpmbuild/RPMS/x86_64/
with_items:
- "roles/slurm_common/files/slurm-plugins-{{ slurm_version }}-1.el6.x86_64.rpm"
- "roles/slurm_common/files/slurm-{{ slurm_version }}-1.el6.x86_64.rpm"
- "roles/slurm_common/files/slurm-perlapi-{{ slurm_version }}-1.el6.x86_64.rpm"
- "roles/slurm_common/files/slurm-slurmdb-direct-{{ slurm_version }}-1.el6.x86_64.rpm"
- "roles/slurm_common/files/slurm-sql-{{ slurm_version }}-1.el6.x86_64.rpm"
- "roles/slurm_common/files/slurm-lua-{{ slurm_version }}-1.el6.x86_64.rpm"
- "roles/slurm_common/files/slurm-devel-{{ slurm_version }}-1.el6.x86_64.rpm"
- "roles/slurm_common/files/slurm-pam_slurm-{{ slurm_version }}-1.el6.x86_64.rpm"
- "roles/slurm_common/files/slurm-sjstat-{{ slurm_version }}-1.el6.x86_64.rpm"
- "roles/slurm_common/files/slurm-slurmdbd-{{ slurm_version }}-1.el6.x86_64.rpm"
#- "roles/slurm_common/files/slurm-spank-x11-{{ slurm_version }}-1.el6.x86_64.rpm"
- "roles/slurm_common/files/slurm-torque-{{ slurm_version }}-1.el6.x86_64.rpm"
- "roles/slurm_common/files/slurm-munge-{{ slurm_version }}-1.el6.x86_64.rpm"
- "roles/slurm_common/files/slurm-sjobexit-{{ slurm_version }}-1.el6.x86_64.rpm"


- name: install Slurm
#yum: name="/root/rpmbuild/RPMS/x86_64/{{ item }}-{{ slurm_version }}-1.el6.x86_64.rpm" state=present
yum: name={{ item }} state=present
tags: slurm
with_items:

- "/root/rpmbuild/RPMS/x86_64/slurm-plugins-{{ slurm_version }}-1.el6.x86_64.rpm"
- "/root/rpmbuild/RPMS/x86_64/slurm-{{ slurm_version }}-1.el6.x86_64.rpm"
- "/root/rpmbuild/RPMS/x86_64/slurm-perlapi-{{ slurm_version }}-1.el6.x86_64.rpm"
- "/root/rpmbuild/RPMS/x86_64/slurm-slurmdb-direct-{{ slurm_version }}-1.el6.x86_64.rpm"
- "/root/rpmbuild/RPMS/x86_64/slurm-sql-{{ slurm_version }}-1.el6.x86_64.rpm"
- "/root/rpmbuild/RPMS/x86_64/slurm-lua-{{ slurm_version }}-1.el6.x86_64.rpm"
- "/root/rpmbuild/RPMS/x86_64/slurm-devel-{{ slurm_version }}-1.el6.x86_64.rpm"
- "/root/rpmbuild/RPMS/x86_64/slurm-pam_slurm-{{ slurm_version }}-1.el6.x86_64.rpm"
- "/root/rpmbuild/RPMS/x86_64/slurm-sjstat-{{ slurm_version }}-1.el6.x86_64.rpm"
- "/root/rpmbuild/RPMS/x86_64/slurm-slurmdbd-{{ slurm_version }}-1.el6.x86_64.rpm"
#- "/root/rpmbuild/RPMS/x86_64/slurm-spank-x11-{{ slurm_version }}-1.el6.x86_64.rpm"
- "/root/rpmbuild/RPMS/x86_64/slurm-torque-{{ slurm_version }}-1.el6.x86_64.rpm"
- "/root/rpmbuild/RPMS/x86_64/slurm-munge-{{ slurm_version }}-1.el6.x86_64.rpm"
- "/root/rpmbuild/RPMS/x86_64/slurm-sjobexit-{{ slurm_version }}-1.el6.x86_64.rpm"


- name: pam.d/slurm
copy: src=slurm dest=/etc/pam.d/slurm owner=root mode="644"
tags: slurm


- name: slurm.conf
template: src=slurm.conf.j2 dest=/etc/slurm/slurm.conf owner=root mode="644"
tags: slurm

- name: add slurm user
user: name=slurm shell=/sbin/nologin createhome=no home=/nonexixtent system=yes append=yes
tags: slurm

- name: add slurm log dir
file: path="/var/log/slurm" state=directory owner=slurm group=slurm mode=750
tags: slurm

- name: add slurm tmp dir
file: path="/tmp/slurmd" state=directory owner=slurm group=slurm mode=750
tags: slurm

- name: add slurm tmp dir
file: path="/tmp/slurmstate" state=directory owner=slurm group=slurm mode=750
tags: slurm


Loading

0 comments on commit e43a595

Please sign in to comment.