Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiny self-hosted cluster for HPC Carpentry workshop? #442

Open
wirawan0 opened this issue Sep 25, 2023 · 2 comments
Open

Tiny self-hosted cluster for HPC Carpentry workshop? #442

wirawan0 opened this issue Sep 25, 2023 · 2 comments

Comments

@wirawan0
Copy link

wirawan0 commented Sep 25, 2023

Hi folks,

Disclaimer first: I am tossing this idea for now but will have to do the testing later. Background: I was tasked to test out "ColdFront" on my primary duty and the tutorial comes with a set of Docker containers that altogether creates a workable HPC environment (complete with Open OnDemand and XDMoD and ColdFront services!). You can see the environment here: https://github.com/ubccr/hpc-toolset-tutorial/blob/master/docs/getting_started.md . This environment comes with a login node, two compute nodes, and the ancillary containers mentioned above, plus LDAP and database server(s).

It just come to my mind that we could repurpose this "hpc toolset" container set to have standalone "HPC-like" environment on a multicore laptop / desktop / workstation / server that a user has, if he/she just doesn't have an alternative HPC environment. (Sidebar: In my latest "intro to HPC" teaching I found out that some learners have 6-core and 8-core laptops, which was unheard of before. One other participant has an M2 Macbook Air that has 4 efficiency cores and 4 performance cores. The increased num of cores and memory and storage on modern laptops may make it viable to have a tiny HPC env on learner's own hardware with a number of caveats mentioned below (and more that I still haven't thought of as of now). This is rather easy to set up (for fairly capable Linux user like me) but probably more tricky to get right since it is in container---compute nodes containers may be difficult to pin to specirfic set of physical cores (?) -- don't count me too seriously as I am no Docker expert. So in that sense it is difficult to get the real "HPC" experience where compute nodes have dedicated cores to process stuff -- which will lead to skew in timing of computation (think: the parallel Pi calculation). But on the other hand this containerized environment is quite easy for a Linux or Mac or even Windows (nowadays with WSL) to set up---so that individual users can actually have it running on their own (capable) laptops.

Clearly there is some ironing to do in order to make the "HPC in a container" work in a turn-key manner. But it looks promising to me. As I continue working with my original project I may be able to tell if this is indeed a useful ad-hoc solution.

Wirawan

@mikerenfro
Copy link
Contributor

For a similar goal, but a different implementation, the following would let an instructor build an OpenHPC management system and a compute node in VirtualBox and Vagrant. Needs 2 GB of RAM for the management system plus 4 GB per compute node. Would allow learners to ssh into the management VM on port 2222 of the instructor's system.

# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure("2") do |config|

  # SMS server
  config.vm.define "sms", primary: true do |sms|
    #sms.vm.box = "generic/rocky8"
    sms.vm.box = "bento/rockylinux-8"
    sms.vm.hostname = "sms"
    # sms.vm.synced_folder ".", "/vagrant", disabled: true
    sms.vm.network "private_network", ip: "172.16.0.1", netmask: "255.255.0.0", virtualbox__intnet: "XCBC"
    sms.vm.network "forwarded_port", guest: 22, host: 2222
    sms.vm.provision "shell", inline: <<-SHELL
      YUM="yum -q -y"
      sms_ip="$(nmcli device show eth1 | grep IP4.ADDRESS | awk '{print $NF}' | cut -d/ -f1)"
      sed -ie "\\$s/127.0.1.1/${sms_ip}/" /etc/hosts
      echo "Yum updates"
      ${YUM} update
      echo "OHPC repo"
      ${YUM} install http://repos.openhpc.community/OpenHPC/2/EL_8/x86_64/ohpc-release-2-1.el8.x86_64.rpm
      ${YUM} install dnf-plugins-core
      ${YUM} config-manager --set-enabled powertools
      echo "OHPC docs install"
      ${YUM} install docs-ohpc perl
      echo "Fix recipe settings"
      perl -pi.bak -e \
        's/c_mac\\[0\\]=00:1a:2b:3c:4f:56/c_mac\\[0\\]=08:00:27:00:00:01/;s/c_mac\\[1\\]=00:1a:2b:3c:4f:56/c_mac\\[1\\]=08:00:27:00:00:02/;s/c_mac\\[2\\]=00:1a:2b:3c:4f:56/c_mac\\[2\\]=08:00:27:00:00:03/;s/c_mac\\[3\\]=00:1a:2b:3c:4f:56/c_mac\\[3\\]=08:00:27:00:00:04/;s/eth_provision:-eth0/eth_provision:-eth1/' \
        /opt/ohpc/pub/doc/recipes/rocky8/input.local
      echo "OHPC recipe.sh"
      /opt/ohpc/pub/doc/recipes/rocky8/x86_64/warewulf/slurm/recipe.sh
      perl -pi.bak -e 's/Sockets=2 CoresPerSocket=8 ThreadsPerCore=2/Sockets=1 CoresPerSocket=1 ThreadsPerCore=1/' /etc/slurm/slurm.conf
      systemctl restart slurmctld
    SHELL
  end

  # Compute servers
  (1..1).each do |compute_idx|
    config.vm.define "c#{compute_idx}", autostart: false do |compute|
      compute.vm.box = "clink15/pxe"
      # compute.vm.hostname = "c#{compute_idx}"
      compute.vm.network "private_network", virtualbox__intnet: "XCBC", mac: "08002700000#{compute_idx}", auto_config: false
      compute.ssh.insert_key = false
      compute.vm.allow_fstab_modification = false
      compute.vm.allow_hosts_modification = false
      compute.vm.boot_timeout = 1
      compute.vm.provider "virtualbox" do |vb|
        vb.customize ["modifyvm", :id, "--nicbootprio2", "1"]
        vb.memory = "4096"
      end
      compute.vm.synced_folder ".", "/vagrant", disabled: true
    end
  end

end

@ocaisa
Copy link
Contributor

ocaisa commented May 17, 2024

One could just look at what Slurm do for their tutorials: https://gitlab.com/SchedMD/training/docker-scale-out/

That should give you a working Slurm cluster, and with a little bit of effort you could get EESSI on there which would give you a software stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants