Toy path tracer for my own learning purposes, using various approaches/techs. Somewhat based on Peter Shirley's Ray Tracing in One Weekend minibook (highly recommended!), and on Kevin Beason's smallpt.
I decided to write blog posts about things I discover as I do this, currently:
- Part 0: Intro
- Part 1: Initial C++ and walkthrough
- Part 2: Fix stupid performance issue
- Part 3: C#, Unity and Burst
- Part 4: Correctness fixes and Mitsuba
- Part 5: simple GPU version via Metal
- Part 6: simple GPU version via D3D11
- Part 7: initial C++ SIMD & SoA
- Part 8: SSE SIMD for HitSpheres
- Part 9: ryg optimizes my code
- Part 10: Update all implementations to match
- Part 11: Buffer-oriented approach on CPU
- Part 12: Buffer-oriented approach on GPU D3D11
- Part 13: GPU thread group data optimization
- Part 14: Make it run on iOS
- Part 15: A bunch of path tracing links
- Part 16: Unity C# Burst optimization
- Part 17: WebAssembly
Right now: can only do spheres, no bounding volume hierachy of any sorts, a lot of stuff hardcoded.
Implementations I'm playing with (again, everything is in toy/learning/WIP state; likely suboptimal) are below. These are all on a scene with ~50 spheres and two light sources, measured in Mray/s.
- CPU. Testing on "PC" AMD ThreadRipper 1950X 3.4GHz (SMT disabled, 16c/16t) and "Mac" mid-2018 MacBookPro i9 2.9GHz (6c/12t):
- C++ w/ some SSE SIMD: PC 187, Mac 74, iPhone X (A11) 12.9, iPhone SE (A9) 8.5
- C++: PC 100, Mac 35.7
- C# (Unity with Burst compiler w/ some 4-wide SIMD): PC 133, Mac 60. Note that this is an early version of Burst.
- C# (Unity with Burst compiler): PC 82, Mac 36. Note that this is an early version of Burst.
- C# (.NET Core): PC 53, Mac 23.6
- C# (Mono with optimized settings): Mac 22.0
- C# (Mono defaults): Mac 6.1
- WebAssembly (single threaded, no SIMD): 4.5-5.5 Mray/s on PCs, 2.0-4.0 Mray/s on mobiles.
- GPU. Simplistic ports to compute shader:
- PC D3D11. GeForce GTX 1080 Ti: 1854
- Mac Metal. AMD Radeon Pro 560X: 246
- iOS Metal. A11 GPU (iPhone X): 46.6, A9 GPU (iPhone SE): 19.8
A lot of stuff in the implementation is totally suboptimal or using the tech in a "wrong" way. I know it's just a simple toy, ok :)
- C++ projects:
- Windows (Visual Studio 2017) in
Cpp/Windows/TestCpu.sln
. DX11 Win32 app that displays result as a fullscreen CPU-updated or GPU-rendered texture. - Mac/iOS (Xcode 9) in
Cpp/Apple/Test.xcodeproj
. Metal app that displays result as a fullscreen CPU-updated or GPU-rendered texture. Should work on both Mac (Test Mac
target) and iOS (Test iOS
target). - WebAssembly in
Cpp/Emscripten/build.sh
. CPU, single threaded, no SIMD.
- Windows (Visual Studio 2017) in
- C# project in
Cs/TestCs.sln
. A command line app that renders some frames and dumps out final TGA screenshot at the end. - Unity project in
Unity
. I used Unity 2018.2.13.