Skip to content

Commit

Permalink
merge from devel
Browse files Browse the repository at this point in the history
  • Loading branch information
andre-merzky committed Jan 23, 2024
2 parents a9015ec + d7be4d9 commit 02c7043
Show file tree
Hide file tree
Showing 10 changed files with 843 additions and 25 deletions.
68 changes: 68 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
cff-version: 1.2.0
title: RADICAL-Pilot
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Andre
family-names: Merzky
- given-names: Matteo
family-names: Turilli
- given-names: Mikhail
family-names: Titov
- given-names: Aymen
family-names: Al-Saadi
- given-names: Shantenu
family-names: Jha
identifiers:
- type: url
value: 'https://github.com/radical-cybertools/radical.pilot'
description: GitHub repository
- type: doi
value: 10.1109/TPDS.2021.3105994
repository-code: 'https://github.com/radical-cybertools/radical.pilot'
url: 'https://radicalpilot.readthedocs.io/'
abstract: >-
RADICAL-Pilot (RP) is a Pilot system written in Python and
specialized in executing applications composed of many
computational tasks on high performance computing (HPC)
platforms. As a Pilot system, RP separates resource
acquisition from using those resources to execute
application tasks. Resources are acquired by submitting a
job to the batch system of an HPC machine. Once the job is
scheduled on the requested resources, RP can directly
schedule and launch application tasks on those resources.
Thus, tasks are not scheduled via the batch system of the
HPC platform, but directly on the acquired resources.
keywords:
- High Performance Computing (HPC)
- Pilot Job
- Scientific Computing
license: MIT
references:
- type: article
scope: Cite this paper if you want to reference the general concepts of the software.
authors:
- family-names: Merzky
given-names: Andre
orcid: 'https://orcid.org/0000-0002-7228-4327'
- family-names: Turilli
given-names: Matteo
orcid: 'https://orcid.org/0000-0003-0527-1435'
- family-names: Titov
given-names: Mikhail
orcid: 'https://orcid.org/0000-0003-2357-7382'
- family-names: Al-Saadi
given-names: Aymen
orcid: 'https://orcid.org/0000-0001-7491-4946'
- family-names: Jha
given-names: Shantenu
orcid: 'https://orcid.org/0000-0002-5040-026X'
title: "Design and Performance Characterization of RADICAL-Pilot on Leadership-Class Platforms"
year: 2022
journal: IEEE Transactions on Parallel and Distributed Systems
volume: 33
issue: 4
pages: 818-829
doi: 10.1109/TPDS.2021.3105994
140 changes: 115 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,129 @@
# RADICAL-Pilot (RP)

[![Build Status](https://github.com/radical-cybertools/radical.pilot/actions/workflows/ci.yml/badge.svg)](https://github.com/radical-cybertools/radical.pilot/actions/workflows/ci.yml)
[![Documentation Status](https://readthedocs.org/projects/radicalpilot/badge/?version=stable)](http://radicalpilot.readthedocs.io/en/stable/?badge=stable)
[![codecov](https://codecov.io/gh/radical-cybertools/radical.pilot/branch/devel/graph/badge.svg)](https://codecov.io/gh/radical-cybertools/radical.pilot)
[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/8224/badge)](https://www.bestpractices.dev/projects/8224)

RADICAL-Pilot (RP) is a Pilot system written in Python and specialized
in executing applications composed of many computational tasks on high
performance computing (HPC) platforms. As a Pilot system, RP separates resource
acquisition from using those resources to execute application tasks. Resources
are acquired by submitting a job to the batch system of an HPC machine. Once
the job is scheduled on the requested resources, RP can directly schedule and
launch application tasks on those resources. Thus, tasks are not scheduled via
the batch system of the HPC platform, but directly on the acquired resources.
RADICAL-Pilot (RP) executes heterogeneous tasks with maximum concurrency and at
scale. RP can concurrently execute up to $10^5$ heterogeneous tasks, including
single/multi core/GPU and MPI/OpenMP. Tasks can be stand-alone executables or
Python functions and both types of task can be concurrently executed.

RP is a [Pilot system](https://doi.org/10.1145/3177851), i.e., it separates
resource acquisition from using those resources to execute application tasks. RP
acquires resources by submitting a job to an HPC platform, and it can directly
schedule and launch computational tasks on those resources. Thus, tasks are
directly scheduled on the acquired resources, not via the batch system of the
HPC platform. RP supports concurrently using single/multiple pilots on
single/multiple
[high performance computing (HPC) platforms](https://radicalpilot.readthedocs.io/en/stable/supported.html).

RP is written in Python and exposes a simple yet powerful
[API](https://radicalpilot.readthedocs.io/en/stable/apidoc.html). In 15 lines of
code, you can execute an arbitrary number of executables with maximum
concurrency on a
[Linux container](https://hub.docker.com/u/radicalcybertools)
or, by changing `resource`, on one of the
[supported HPC platforms](https://radicalpilot.readthedocs.io/en/stable/supported.html).

```python
import radical.pilot as rp

# Create a session
session = rp.Session()

# Create a pilot manager and a pilot
pmgr = rp.PilotManager(session=session)
pd_init = {'resource': 'local.localhost',
'runtime' : 30,
'cores' : 4}
pdesc = rp.PilotDescription(pd_init)
pilot = pmgr.submit_pilots(pdesc)

# Crate a task manager and describe your tasks
tmgr = rp.TaskManager(session=session)
tmgr.add_pilots(pilot)
tds = list()
for i in range(8):
td = rp.TaskDescription()
td.executable = 'sleep'
td.arguments = ['10']
tds.append(td)

# Submit your tasks for execution
tmgr.submit_tasks(tds)
tmgr.wait_tasks()

# Close your session
session.close(cleanup=True)
```

## Quick Start

Run RP's [quick start tutorial](https://mybinder.org/v2/gh/radical-cybertools/radical.pilot/HEAD?labpath=docs%2Fsource%2Fgetting_started.ipynb) directly on Binder. No installation needed.

After going through the tutorial, install RP and start to code your application:

```shell
python -m venv ~/.ve/radical-pilot
. ~/.ve/radical-pilot/bin/activate
pip install radical.pilot
```

Note that other than `venv`, you can also use
[`virtualenv`](https://radicalpilot.readthedocs.io/en/stable/getting_started.html#Virtualenv),
[`conda`](https://radicalpilot.readthedocs.io/en/stable/getting_started.html#Conda)
or
[`spack`](https://radicalpilot.readthedocs.io/en/stable/getting_started.html#Spack).

For some inspiration, see our RP application
[examples](https://github.com/radical-cybertools/radical.pilot/tree/devel/examples),
starting from
[00_getting_started.py](https://github.com/radical-cybertools/radical.pilot/blob/devel/examples/00_getting_started.py)
.

## Documentation

Full system description and usage examples are available at:
https://radicalpilot.readthedocs.io/en/stable/
[RP user documentation](https://radicalpilot.readthedocs.io/en/stable/) uses Sphinx, and it is published on Read the Docs.

[RP tutorials](https://mybinder.org/v2/gh/radical-cybertools/radical.pilot/HEAD) can be run via Binder.

## Developers

RP development uses Git and
[GitHub](https://github.com/radical-cybertools/radical.pilot). RP **requires**
Python3, a virtual environment and a GNU/Linux OS. Clone, install and
test RP:

Additional information is provided in the
[wiki](https://github.com/radical-cybertools/radical.pilot/wiki) section of RP
GitHub repository.
```shell
python -m venv ~/.ve/rp-docs
. ~/.ve/rp-docs/bin/activate
git clone [email protected]:radical-cybertools/radical.pilot.git
cd radical.pilot
pip install -r requirements-docs.txt
sphinx-build -M html docs/source/ docs/build/
```

## Code
RP documentation uses tutorials coded as Jupyter notebooks. `Sphinx` and
`nbsphinx` run RP locally to execute those tutorials. Successful compilation of
the documentation also serves as a validation of your local development
environment.

Generally, the `master` branch reflects the RP release published on
[PyPI](https://pypi.org/project/radical.pilot/), and is considered stable:
it should work 'out of the box' for the supported backends. For a list of
supported backends, please refer to the documentation.
## Provide Feedback

The `devel` branch (and any branch other than master) may not correspond to the
published documentation and, specifically, may have dependencies which need to
be resolved manually.
Have a question, feature request or you found a bug? Feel free to open a
[support ticket](https://github.com/radical-cybertools/radical.pilot/issues).
For vulnerabilities, please draft a private
[security advisory](https://github.com/radical-cybertools/radical.pilot/security/advisories).

## Integration Tests status
These badges show the state of the current integration tests on different HPCs RADICAL Pilot supports
## Contributing

[![ORNL Summit Integration Tests](https://github.com/radical-cybertools/radical.pilot/actions/workflows/summit.yml/badge.svg)](https://github.com/radical-cybertools/radical.pilot/actions/workflows/summit.yml)
[![PSC Bridges2 Integration Tests](https://github.com/radical-cybertools/radical.pilot/actions/workflows/bridges.yml/badge.svg)](https://github.com/radical-cybertools/radical.pilot/actions/workflows/bridges.yml)
We welcome everyone that wants to contribute to RP development. We are an open
and welcoming community, committed to making participation a harassment-free
experience for everyone. See our
[Code of Conduct](https://radicalpilot.readthedocs.io/en/stable/process/code_of_conduct.html),
relevant
[technical documentation](https://radicalpilot.readthedocs.io/en/stable/process/contributing.html)
and feel free to
[get in touch](https://github.com/radical-cybertools/radical.pilot/issues).
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ distributed architecture.
tutorials.rst
supported.rst
envs.rst
process/contributing.rst
glossary.rst
internals.rst
apidoc.rst
Expand Down
93 changes: 93 additions & 0 deletions docs/source/process/branching_model.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
.. _branching_model:

Branching Model
===============

RADICAL-Pilot (RP) uses `git-flow
<http://nvie.com/posts/a-successful-git-branching-model/>`__ as branching model,
with some simplifications. We follow `semantic version numbering
<http://semver.org/>`__.

- Release candidates and releases are tagged in the ``master`` branch (we do
not use dedicated release branches at this point).

- A release is prepared by:

- Tagging a release candidate on ``devel`` (e.g. ``v1.23RC4``);
- testing that RC;
- problems are fixed in ``devel``, toward a new release candidate;
- once the RC is found stable, ``devel`` is merged to master, the release is
tagged on master (e.g. ``v1.23``) and shipped to PyPI.

- Urgent hotfix releases:

- Branch from master to ``hotfix/problem_name``;
- fix the problem in that branch;
- test that branch;
- merge back to master and prepare release candidate for hotfix release.

- Normal bug fixes:

- Branch of ``devel``, naming convention: ``fix/issue_1234`` (reference
GitHub issue);
- fix in that branch, and test;
- create pull request toward ``devel``;
- code review, then merge.

- Major development activities go into feature branches:

- Branch ``devel`` into ``feature/feature_name``;
- work on that feature branch;
- on completion, merge ``devel`` into the feature branch (that should be
done frequently if possible, to avoid large deviation (== pain) of the
branches);
- test the feature branch;
- create a pull request for merging the feature branch into ``devel`` (that
should be a fast-forward now);
- merging of feature branches into ``devel`` should be discussed with the
group *before* they are performed, and only after code review.

- Documentation changes are handled like fix or feature branches, depending on
size and impact, similar to code changes.

Branch Naming
-------------

- ``devel``, ``master``: *never* commit directly to those;
- ``feature/abc``: development of new features;
- ``fix/abc_123``: referring to ticket 123;
- ``hotfix/abc_123``: referring to ticket 123, to be released right after merge
to master;
- ``experiment/sc16``: experiments toward a specific publication etc. Cannot be
merged, they will be converted to tags after experiments conclude;
- ``project/xyz``: branch for a dedicated group of people, usually contains
unreleased features/fixes, and is not expected to be merged back;
- ``tmp/abc``: temporary branch, will be deleted soon;
- ``test/abc``: test some integration, like a merge of two feature branches.

For the latter: assume you want to test how ``feature/a`` works in combination
with ``feature/b``, then:

- ``git checkout feature/a``;
- ``git checkout -b test/a_b``;
- ``git merge feature/b``;
- do tests.

Branching Policies
------------------

All branches are ideally short living. To support this, only a limited number of
branches should be open at any point in time. Like, only ``N`` branches for
fixes and ``M << N`` branches for features should be open for each developer -
other features / issues have to wait.

Some additional rules
---------------------

- Commits, in particular for bug fixes, should be self-contained so make it
easy to use ``git cherry-pick``, so that bug fixes can quickly be transferred
to other branches (such as hotfixes).
- Do not use ``git rebase``, unless you *really* know what you are doing.
- You may not want to use the tools available for ``git-flow`` -- those have
given us inconsistent results in the past, partially because they used
rebase.
Loading

0 comments on commit 02c7043

Please sign in to comment.