Skip to content

mOS for HPC v0.7 Readme

Rolf Riesen edited this page Aug 13, 2020 · 3 revisions

What is mOS for HPC

mOS for HPC is an operating systems research project at Intel, targeting extreme scale HPC systems deploying converged workflows for modeling / simulation, data analytics, and AI.  It aims to deliver a high performance computing environment with the scalability, low noise, and repeatability expected of lightweight kernels (LWK), while maintaining overall Linux compatibility that HPC plus AI / ML applications need.  

mOS for HPC remains under development at this time. These materials are being made available to interested parties to explore, to test drive, and to provide feedback through the mailing list. Consider the quality level to be pre-alpha and as-is. It is not intended to be used in production or for business critical uses. The required knowledge level of users is expected to be expert for Linux internals and for operating system principles. Support is limited by the development team's ability to respond through the mailing list. 

What's new for v0.7?

Feature Description Learn More
Partition Creation

The lwkctl command that controls the dynamic creation and configuration of the lightweight kernel without system reboot now has an auto option. The auto option generates a configuration optimized for use in a typical HPC application environment, taking into account the current node topology and available memory.

mOS for HPC v0.7 Administrators Guide
Scheduler A new experimental yod option, idle-control, has been added. This option allows selection of the mechanism to be used by the lightweight kernel when it enters an idle condition on a CPU and also allows selection of the conditions under which a CPU will be brought into a lower power C-state. This option can be useful to control the balance between power consumption and performance. mOS for HPC v0.7 Users Guide
CPU Configuration Additional flexibility has been added to the allowed lightweight kernel CPU configurations. Recall that in past versions, some CPUs were designated to handle system calls and to host utility threads on behalf of lightweight kernel CPUs. Now, these system call target CPUs are no longer required. If there are no system call target CPUs configured for a grouping of lightweight kernel CPUs, then the system calls will be executed locally on the lightweight kernel CPUs themselves. The requirement to have at least one utility CPU configured has been removed. man page for lwkctl
 RAS A flexible and pluggable RAS subsystem has been introduced.  This subsystem formalizes various mOS error and warning messages, converting them from raw console messages to events that can be structured and formatted in a manner consistent with an HPC control system.  Formatting is pluggable with the default being human readable console messages.  An implementation is also provided that is compatible with Intel's Unified Control System (UCS).
 XpMem Support for the XpMem inter-process memory sharing interface has been added.   Both lightweight kernel and regular Linux processes support XpMem.  The mOS for HPC implementation is enhanced over the standard mechanism to leverage the large page support that is inherently part of the mOS memory subsystem.
Linux 4.14.110 mOS for HPC v0.7 is based on the long term support kernel Linux 4.14.110 from kernel.org.
From a compatibility perspective, this version has been integrated and tested on a system based on SLES 12 SP3 with OpenHPC.





Platform requirements

The development and testing for mOS for HPC v0.7 has been performed on systems with Intel(R) Xeon(R) Scalable processor family and on systems with Intel(R) Xeon Phi(TM) product family. As a result, mOS for HPC includes optimizations considering technologies such as multi-socket CPUshigh core counts, Intel® Hyper-Threading Technology; and complex memory configurations up to 8 NUMA domains (DDR + high bandwidth memory). Specific configurations include:

  • Intel(R) Xeon(R) Gold 6140 processors with 128GiB of DDR4, Intel(R) HT Technology on, and booted without subnuma clustering (SNC)
  • Intel(R) Xeon Phi(TM) processors 7250 with 96GiB of DRAM and 16GiB of MCDRAM, Intel(R) HT Technology on, and booted in sub-NUMA clustering 4 (SNC-4) mode and flat memory mode

Your mileage may vary on other platforms and configurations in terms of functionality and performance.

Additional remarks:

  • If you use the Intel(R) Xeon Phi(TM) processor 7230, then Quadrant cluster mode, Flat memory mode is recommended.
  • If you want to make all of MCDRAM available to applications on Intel(R) Xeon Phi(TM) processors, you must verify that MCDRAM is hot-pluggable in the BIOS settings.  Please see the Administrator's Guide.
  • The development team has observed lower performance of mOS for HPC when running in cache memory mode on Intel(R) Xeon Phi(TM) processors, which is not necessarily attributed to hardware.
  • Processors outside of the x86_64 architecture designation in Linux are unsupported – the kernel code will not configure and build.

The Linux distribution used by the development team for building, installing, and testing mOS for HPC has been for SLES 12 SP3 with OpenHPC. There also has been limited testing with CentOS 7.  Other distributions have had almost no testing, and may require adaptations for the build, install instructions to your environment.

mOS for HPC development plans to track Intel(R) Parallel Studio XE Cluster Edition 2019 version for Linux* and MPICH/MPICH4 updates as they become available.  Almost no testing has been done using other compilers (e.g. gcc) or MPI runtimes (e.g. MVAPICH or OpenMPI). 

Where to get code

The mOS for HPC source can be checked out from GitHub at https://github.com/intel/mOS.   Please see the Administrator's Guide for further instructions.

Where to report issues or ask questions

Register for the mOS for HPC mailing list at https://groups.google.com/g/mos-devel/. Please, submit feedback and follow discussions through this list.


*Other names and brands may be claimed as the property of others.