Skip to content

Commit

Permalink
Deployed b00de20 with MkDocs version: 1.5.3
Browse files Browse the repository at this point in the history
  • Loading branch information
Unknown committed Apr 9, 2024
1 parent e231235 commit d6468cd
Show file tree
Hide file tree
Showing 4 changed files with 100 additions and 22 deletions.
2 changes: 1 addition & 1 deletion search/search_index.json

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -1183,6 +1183,24 @@
</label>
<ul class="md-nav__list" data-md-component="toc" data-md-scrollfix>

<li class="md-nav__item">
<a href="#requirements" class="md-nav__link">
<span class="md-ellipsis">
Requirements
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#overview" class="md-nav__link">
<span class="md-ellipsis">
Overview
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#submitting-a-persistent-volume-claim" class="md-nav__link">
<span class="md-ellipsis">
Expand Down Expand Up @@ -2171,6 +2189,24 @@
</label>
<ul class="md-nav__list" data-md-component="toc" data-md-scrollfix>

<li class="md-nav__item">
<a href="#requirements" class="md-nav__link">
<span class="md-ellipsis">
Requirements
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#overview" class="md-nav__link">
<span class="md-ellipsis">
Overview
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#submitting-a-persistent-volume-claim" class="md-nav__link">
<span class="md-ellipsis">
Expand Down Expand Up @@ -2255,7 +2291,10 @@



<h1 id="requesting-persistent-volumes-with-kubernetes">Requesting Persistent Volumes With Kubernetes</h1>
<h1 id="requesting-persistent-volumes-with-kubernetes">Requesting persistent volumes With Kubernetes</h1>
<h2 id="requirements">Requirements</h2>
<p>It is recommended that users complete <a href="../L1_getting_started/#requirements">Getting started with Kubernetes</a> before proceeding with this tutorial.</p>
<h2 id="overview">Overview</h2>
<p>Pods in the K8s EIDF GPU Service are intentionally ephemeral.</p>
<p>They only last as long as required to complete the task that they were created for.</p>
<p>Keeping pods ephemeral ensures the cluster resources are released for other users to request.</p>
Expand All @@ -2282,12 +2321,12 @@ <h3 id="example-persistentvolumeclaim">Example PersistentVolumeClaim</h3>
<span class="w"> </span><span class="nt">storage</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">2Gi</span>
<span class="w"> </span><span class="nt">storageClassName</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">csi-rbd-sc</span>
</code></pre></div>
<p>You create a persistent volume by passing the yaml file to kubectl like a pod specification yaml <code>kubectl create &lt;PVC specification yaml&gt;</code>
<p>You create a persistent volume by passing the yaml file to kubectl like a pod specification yaml <code>kubectl -n &lt;project-namespace&gt; create &lt;PVC specification yaml&gt;</code>
Once you have successfully created a persistent volume you can interact with it using the standard kubectl commands:</p>
<ul>
<li><code>kubectl delete pvc &lt;PVC name&gt;</code></li>
<li><code>kubectl get pvc &lt;PVC name&gt;</code></li>
<li><code>kubectl apply -f &lt;PVC specification yaml&gt;</code></li>
<li><code>kubectl -n &lt;project-namespace&gt; delete pvc &lt;PVC name&gt;</code></li>
<li><code>kubectl -n &lt;project-namespace&gt; get pvc &lt;PVC name&gt;</code></li>
<li><code>kubectl -n &lt;project-namespace&gt; apply -f &lt;PVC specification yaml&gt;</code></li>
</ul>
<h2 id="mounting-a-persistent-volume-to-a-pod">Mounting a persistent Volume to a Pod</h2>
<p>Introducing a persistent volume to a pod requires the addition of a volumeMount option to the container and a volume option linking to the PVC in the pod specification yaml.</p>
Expand Down Expand Up @@ -2327,14 +2366,14 @@ <h3 id="example-pod-specification-yaml-with-mounted-persistent-volume">Example p
<h2 id="accessing-the-persistent-volume-outside-a-pod">Accessing the persistent volume outside a pod</h2>
<p>To move files in/out of the persistent volume from outside a pod you can use the kubectl cp command.</p>
<div class="highlight"><pre><span></span><code>***<span class="w"> </span>On<span class="w"> </span>Login<span class="w"> </span>Node<span class="w"> </span>-<span class="w"> </span>replacing<span class="w"> </span>pod<span class="w"> </span>name<span class="w"> </span>with<span class="w"> </span>your<span class="w"> </span>pod<span class="w"> </span>name<span class="w"> </span>***
kubectl<span class="w"> </span>cp<span class="w"> </span>/home/data/test_data.csv<span class="w"> </span>test-ceph-pvc-job-8c9cc:/mnt/ceph_rbd
kubectl<span class="w"> </span>-n<span class="w"> </span>&lt;project-namespace&gt;<span class="w"> </span>cp<span class="w"> </span>/home/data/test_data.csv<span class="w"> </span>test-ceph-pvc-job-8c9cc:/mnt/ceph_rbd
</code></pre></div>
<p>For more complex file transfers and synchronisation, create a low resource pod with the persistent volume mounted.</p>
<p>The bash command rsync can be amended to manage file transfers into the mounted PV following <a href="https://github.com/toelke/docker-rsync/#in-kubernetes-cronjob">this GitHub repo</a>.</p>
<h2 id="clean-up">Clean up</h2>
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>delete<span class="w"> </span>job<span class="w"> </span>test-ceph-pvc-job
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>-n<span class="w"> </span>&lt;project-namespace&gt;<span class="w"> </span>delete<span class="w"> </span>job<span class="w"> </span>test-ceph-pvc-job

kubectl<span class="w"> </span>delete<span class="w"> </span>pvc<span class="w"> </span>test-ceph-pvc
kubectl<span class="w"> </span>-n<span class="w"> </span>&lt;project-namespace&gt;<span class="w"> </span>delete<span class="w"> </span>pvc<span class="w"> </span>test-ceph-pvc
</code></pre></div>


Expand Down
65 changes: 52 additions & 13 deletions services/gpuservice/training/L3_running_a_pytorch_task/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1204,6 +1204,24 @@
</label>
<ul class="md-nav__list" data-md-component="toc" data-md-scrollfix>

<li class="md-nav__item">
<a href="#requirements" class="md-nav__link">
<span class="md-ellipsis">
Requirements
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#overview" class="md-nav__link">
<span class="md-ellipsis">
Overview
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#load-training-data-and-ml-code-into-a-persistent-volume" class="md-nav__link">
<span class="md-ellipsis">
Expand Down Expand Up @@ -2207,6 +2225,24 @@
</label>
<ul class="md-nav__list" data-md-component="toc" data-md-scrollfix>

<li class="md-nav__item">
<a href="#requirements" class="md-nav__link">
<span class="md-ellipsis">
Requirements
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#overview" class="md-nav__link">
<span class="md-ellipsis">
Overview
</span>
</a>

</li>

<li class="md-nav__item">
<a href="#load-training-data-and-ml-code-into-a-persistent-volume" class="md-nav__link">
<span class="md-ellipsis">
Expand Down Expand Up @@ -2328,7 +2364,10 @@


<h1 id="running-a-pytorch-task">Running a PyTorch task</h1>
<p>In the following lesson, we'll build a NLP neural network and train it using the EIDF GPU Service.</p>
<h2 id="requirements">Requirements</h2>
<p>It is recommended that users complete <a href="../L1_getting_started/#requirements">Getting started with Kubernetes</a> and <a href="../L2_requesting_persistent_volumes/#requirements">Requesting persistent volumes With Kubernetes</a> before proceeding with this tutorial.</p>
<h2 id="overview">Overview</h2>
<p>In the following lesson, we'll build a CNN neural network and train it using the EIDF GPU Service.</p>
<p>The model was taken from the <a href="https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html">PyTorch Tutorials</a>.</p>
<p>The lesson will be split into three parts:</p>
<ul>
Expand All @@ -2339,7 +2378,7 @@ <h1 id="running-a-pytorch-task">Running a PyTorch task</h1>
<h2 id="load-training-data-and-ml-code-into-a-persistent-volume">Load training data and ML code into a persistent volume</h2>
<h3 id="create-a-persistent-volume">Create a persistent volume</h3>
<p>Request memory from the Ceph server by submitting a PVC to K8s (example pvc spec yaml below).</p>
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>create<span class="w"> </span>-f<span class="w"> </span>&lt;pvc-spec-yaml&gt;
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>-n<span class="w"> </span>&lt;project-namespace&gt;<span class="w"> </span>create<span class="w"> </span>-f<span class="w"> </span>&lt;pvc-spec-yaml&gt;
</code></pre></div>
<h3 id="example-pytorch-persistentvolumeclaim">Example PyTorch PersistentVolumeClaim</h3>
<div class="highlight"><pre><span></span><code><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">PersistentVolumeClaim</span>
Expand All @@ -2358,12 +2397,12 @@ <h3 id="transfer-codedata-to-persistent-volume">Transfer code/data to persistent
<ol>
<li>
<p>Check PVC has been created</p>
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>get<span class="w"> </span>pvc<span class="w"> </span>&lt;pv-name&gt;
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>-n<span class="w"> </span>&lt;project-namespace&gt;<span class="w"> </span>get<span class="w"> </span>pvc<span class="w"> </span>&lt;pv-name&gt;
</code></pre></div>
</li>
<li>
<p>Create a lightweight job with pod with PV mounted (example job below)</p>
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>create<span class="w"> </span>-f<span class="w"> </span>lightweight-pod-job.yaml
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>-n<span class="w"> </span>&lt;project-namespace&gt;<span class="w"> </span>create<span class="w"> </span>-f<span class="w"> </span>lightweight-pod-job.yaml
</code></pre></div>
</li>
<li>
Expand All @@ -2373,17 +2412,17 @@ <h3 id="transfer-codedata-to-persistent-volume">Transfer code/data to persistent
</li>
<li>
<p>Copy the Python script into the PV</p>
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>cp<span class="w"> </span>example_pytorch_code.py<span class="w"> </span>lightweight-job-&lt;identifier&gt;:/mnt/ceph_rbd/
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>-n<span class="w"> </span>&lt;project-namespace&gt;<span class="w"> </span>cp<span class="w"> </span>example_pytorch_code.py<span class="w"> </span>lightweight-job-&lt;identifier&gt;:/mnt/ceph_rbd/
</code></pre></div>
</li>
<li>
<p>Check whether the files were transferred successfully</p>
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span><span class="nb">exec</span><span class="w"> </span>lightweight-job-&lt;identifier&gt;<span class="w"> </span>--<span class="w"> </span>ls<span class="w"> </span>/mnt/ceph_rbd
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>-n<span class="w"> </span>&lt;project-namespace&gt;<span class="w"> </span><span class="nb">exec</span><span class="w"> </span>lightweight-job-&lt;identifier&gt;<span class="w"> </span>--<span class="w"> </span>ls<span class="w"> </span>/mnt/ceph_rbd
</code></pre></div>
</li>
<li>
<p>Delete the lightweight job</p>
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>delete<span class="w"> </span>job<span class="w"> </span>lightweight-job-&lt;identifier&gt;
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>-n<span class="w"> </span>&lt;project-namespace&gt;<span class="w"> </span>delete<span class="w"> </span>job<span class="w"> </span>lightweight-job-&lt;identifier&gt;
</code></pre></div>
</li>
</ol>
Expand Down Expand Up @@ -2424,7 +2463,7 @@ <h2 id="creating-a-job-with-a-pytorch-container">Creating a Job with a PyTorch c
<p>We will use the pre-made PyTorch Docker image available on Docker Hub to run the PyTorch ML model.</p>
<p>The PyTorch container will be held within a pod that has the persistent volume mounted and access a MIG GPU.</p>
<p>Submit the specification file below to K8s to create the job, replacing the queue name with your project namespace queue name.</p>
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>create<span class="w"> </span>-f<span class="w"> </span>&lt;pytorch-job-yaml&gt;
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>-n<span class="w"> </span>&lt;project-namespace&gt;<span class="w"> </span>create<span class="w"> </span>-f<span class="w"> </span>&lt;pytorch-job-yaml&gt;
</code></pre></div>
<h3 id="example-pytorch-job-specification-file">Example PyTorch Job Specification File</h3>
<div class="highlight"><pre><span></span><code><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">batch/v1</span>
Expand Down Expand Up @@ -2468,17 +2507,17 @@ <h2 id="reviewing-the-results-of-the-pytorch-model">Reviewing the results of the
<ol>
<li>
<p>Check that the model ran to completion</p>
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>logs<span class="w"> </span>&lt;pytorch-pod-name&gt;
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>-n<span class="w"> </span>&lt;project-namespace&gt;<span class="w"> </span>logs<span class="w"> </span>&lt;pytorch-pod-name&gt;
</code></pre></div>
</li>
<li>
<p>Spin up a lightweight pod to retrieve results</p>
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>create<span class="w"> </span>-f<span class="w"> </span>lightweight-pod-job.yaml
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>-n<span class="w"> </span>&lt;project-namespace&gt;<span class="w"> </span>create<span class="w"> </span>-f<span class="w"> </span>lightweight-pod-job.yaml
</code></pre></div>
</li>
<li>
<p>Copy the trained model back to your access VM</p>
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>cp<span class="w"> </span>lightweight-job-&lt;identifier&gt;:mnt/ceph_rbd/model.pth<span class="w"> </span>model.pth
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>-n<span class="w"> </span>&lt;project-namespace&gt;<span class="w"> </span>cp<span class="w"> </span>lightweight-job-&lt;identifier&gt;:mnt/ceph_rbd/model.pth<span class="w"> </span>model.pth
</code></pre></div>
</li>
</ol>
Expand Down Expand Up @@ -2524,9 +2563,9 @@ <h2 id="using-a-kubernetes-job-to-train-the-pytorch-model-multiple-times">Using
<span class="w"> </span><span class="nt">claimName</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">pytorch-pvc</span>
</code></pre></div>
<h2 id="clean-up">Clean up</h2>
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>delete<span class="w"> </span>pod<span class="w"> </span>pytorch-job
<div class="highlight"><pre><span></span><code>kubectl<span class="w"> </span>-n<span class="w"> </span>&lt;project-namespace&gt;<span class="w"> </span>delete<span class="w"> </span>pod<span class="w"> </span>pytorch-job

kubectl<span class="w"> </span>delete<span class="w"> </span>pvc<span class="w"> </span>pytorch-pvc
kubectl<span class="w"> </span>-n<span class="w"> </span>&lt;project-namespace&gt;<span class="w"> </span>delete<span class="w"> </span>pvc<span class="w"> </span>pytorch-pvc
</code></pre></div>


Expand Down
Binary file modified sitemap.xml.gz
Binary file not shown.

0 comments on commit d6468cd

Please sign in to comment.