This repository contains the Singularity file definitions and all supporting scripts to build a NeRF rendering pipeline on Compute Canada, leveraging Singularity containers.
The goal is to provide an isolated environment where Nerfstudio can be used to render NeRFs from images. Nerfstudio will also allow you to render point clouds and videos from the trained and rendered NeRFs.
You can find logs from COLMAP processing and NeRF model training we conducted on Compute Canada here. Some screenshots from the generated point clouds can be found here.
There's an already-built Singularity container stored in the Cedar and Graham clusters, on Compute Canada. The container can also be found on the Northeastern University Discovery cluster. This way you can skip building the image yourself. However, if you want to customize the image, check the Building the image section.
The containers can be found at:
- Compute Canada:
/scratch/wribas/nerfstudio/nerfstudio-cuda-11-3.sif
- Northeastern Discovery:
/scratch/ribas.w/nerfstudio/nerfstudio-cuda-11-3.sif
Those directories have the permissions set in a way you can read the container file and possibly run it. However, if you want to write data files or outputs for NeRF rendering, you could copy the container file to your folder.
-
When trying to run nerfstudio you get stuck in the download of the Pytorch model. Like this:
Downloading https://download.pytorch.org/models/alexnet-owt-7be5be79.pt to /home/wribas/.cache/torch/hub/checkpoints/alexnet-owt-7be5be79.pth
This might happen because the running container is failing to perform the TLS handshake with the download URL, or it's failing to write the file to disk. To work around this, you can manually download the
alexnet-owt-7be5be79.pth
and feed the cache yourself. Just run this on your local machine (you may need to create the .cache folder manually too):wget https://download.pytorch.org/models/alexnet-owt-7be5be79.pth scp alexnet-owt-7be5be79.pth [email protected]:/home/wribas/.cache/torch/hub/checkpoints/alexnet-owt-7be5be79.pth
-
GPU version: if you get an error like the one below this might mean one of two things. Either you're running nerfstudio on a GPU that's not compatible with the
tiny-cuda-nn
version we use OR the CUDA drivers available are too old (< 11.3) or the driver's version mismatch with what the GPU is using. To solve this, try to use the Nvidia T4 GPU or a newer version. We have tested many GPUs (e.g. p100, p100l, v100) and just got Nvidia T4 to work on Compute Canada.OSError: Could not find compatible tinycudann extension for compute capability 35. /opt/nerfstudio-nu-papers/venv/lib/python3.8/site-packages/torch/cuda/__init__.py:123: UserWarning: Found GPU0 Tesla K40m which is of cuda capability 3.5. PyTorch no longer supports this GPU because it is too old. The minimum cuda capability supported by this library is 3.7. warnings.warn(old_gpu_warn % (d, name, major, minor, min_arch // 10, min_arch % 10)) Traceback (most recent call last): File "/opt/nerfstudio-nu-papers/venv/bin/ns-train", line 5, in <module> from scripts.train import entrypoint File "/opt/nerfstudio-nu-papers/venv/lib/python3.8/site-packages/scripts/train.py", line 49, in <module> from nerfstudio.configs.method_configs import AnnotatedBaseConfigUnion File "/opt/nerfstudio-nu-papers/venv/lib/python3.8/site-packages/nerfstudio/configs/method_configs.py", line 47, in <module> from nerfstudio.field_components.temporal_distortions import TemporalDistortionKind File "/opt/nerfstudio-nu-papers/venv/lib/python3.8/site-packages/nerfstudio/field_components/__init__.py", line 17, in <module> from .encodings import Encoding, ScalingAndOffset File "/opt/nerfstudio-nu-papers/venv/lib/python3.8/site-packages/nerfstudio/field_components/encodings.py", line 34, in <module> import tinycudann as tcnn File "/opt/nerfstudio-nu-papers/venv/lib/python3.8/site-packages/tinycudann/__init__.py", line 9, in <module> from tinycudann.modules import free_temporary_memory, NetworkWithInputEncoding, Network, Encoding File "/opt/nerfstudio-nu-papers/venv/lib/python3.8/site-packages/tinycudann/modules.py", line 58, in <module> raise EnvironmentError(f"Could not find compatible tinycudann extension for compute capability {system_compute_capability}.") OSError: Could not find compatible tinycudann extension for compute capability 35.
SSH into the Graham cluster and start an interactive job. An example of the salloc
command is given below. You may tweak the number of resources for the job depending on the size of the project you'll be rendering.
salloc --time=1:0:0 --cpus-per-task=4 --mem=12G --gpus-per-node=t4:1 --account=<replace_me>
As mentioned in the gotchas, the type of NVIDIA GPU to use is very important. Make sure you're using the most recent GPU possible, as it will provide the right computing capability for tiny-cuda-nn
. For Compute Canada, we've tested nerfstudio with T4 GPUs.
Once you get the job allocation granted, navigate to the directory where the nerfstudio container is located at. Then run:
# The data directory is where you should put the images/videos/artifacts you want nerfstudio to use
mkdir data
# The outputs directory is where nerfstudio will write the result of the renderings
mkdir outputs
# Notice that we're binding `data` and `outputs` to the container
singularity run --nv --bind data/:/opt/nerfstudio-nu-papers/data --bind outputs/:/opt/nerfstudio-nu-papers/outputs nerfstudio-cuda-11-3.sif
You should now be presented with a bash terminal where you can run the nerfstudio CLI commands. More about nerfstudio here.
SSH into the Discovery cluster and start an interactive job. An example of the srun
command is given below. You may tweak the number of resources for the job depending on the size of the project you'll be rendering.
srun --partition=gpu --nodes=1 --pty --gres=gpu:t4:1 --ntasks=2 --mem=4GB --time=01:00:00 /bin/bash
As mentioned in the gotchas, the type of NVIDIA GPU to use is very important. Make sure you're using the most recent GPU possible, as it will provide the right computing capability for tiny-cuda-nn
. For Discovery, we've tested nerfstudio with T4 GPUs.
Once you get the job allocation granted, navigate to the directory where the nerfstudio container is located at. Then run:
# The data directory is where you should put the images/videos/artifacts you want nerfstudio to use
mkdir data
# The outputs directory is where nerfstudio will write the result of the renderings
mkdir outputs
# Notice that we're binding `data` and `outputs` to the container
singularity run --nv --bind data/:/opt/nerfstudio-nu-papers/data --bind outputs/:/opt/nerfstudio-nu-papers/outputs nerfstudio-cuda-11-3.sif
In order to build nerfstudio in a Singularity container you'll need:
- CUDA and Nvidia drivers are installed on the host machine. As a consequence, you're also required to have an Nvidia GPU on the host machine, otherwise, you won't be able to install the drivers.
- A fairly recent Nvidia GPU, that supports at least the Turing architecture or more recent. For context, we've successfully built the container using an RTX 2060.
- Singularity installed on the host machined. Instructions here. We've tested and validated the definition file with Singularity version 3.8.4.
If you need to build the nerfstudio image from scratch, you can use the command below:
sudo singularity build --nv nerfstudio-cuda-11-3.sif nerfstudio.def
It's mandatory you run this command as sudo
, in a computer you have privileged access. Especially when you're going to run the container on Compute Canada, where you don't have sudo
access. If you don't use sudo
during the build, your container won't work properly, or may not even build.
The resulting image will have some gigabytes of size. You may transfer this image to Compute Canada or Discovery using the scp
command. Be aware it may take a while for the transfer to complete depending on your internet connection.
We've provided two different shell pipeline scripts to submit batch jobs:
- nerfstudio-colmap-process.sh: process raw images into a COLMAP structure using nerfstudio processing CLI.
- nerfstudio-train.sh: train a NeRF model from COLMAP structured images;
sbatch ./nerfstudio-colmap-process.sh --images /path/to/read/raw/images --output /path/to/save/colmap
sbatch ./nerfstudio-train.sh --data /path/to/colmap/images/dir --output /path/to/save/nerf/output
Once you have a trained model, you can export the results to images or videos. More information on how to do it here.
Below you'll find an example of a 10 seconds video render from a trained model.
Directories you'll need to bind/created:
- renders: the directory to output the rendered video
- outputs: the directory where the trained model is located
- data: the directory where the COLMAP data is located (especially the transforms.json file)
singularity run --nv --bind renders/:/opt/nerfstudio-nu-papers/renders --bind outputs/:/opt/nerfstudio-nu-papers/outputs --bind data/nerfstudio/poster/:/opt/nerfstudio-nu-papers/data nerfstudio-cuda-11-3.sif
From within the singularity container, run:
ns-render --load-config outputs/data/nerfacto/2023-02-10_232507/config.yml --output-path renders/nerfstudio-poster-sample-camera-path.mp4 --seconds 10 --output-format video --camera-path-filename data/base_cam.json
Be aware that the above commands are just examples and the file/directory paths may not (and probably won't) match what you need.
Authored by Weder Ribas [email protected], under the advisory of Dr. Derek Jacoby and Dr. Yvonne Coady.
This project was developed during the Spring semester of CS7675 (Master's Research) at Northeastern University.