Replies: 38 comments
-
This is in the same vein as something Steve and I were discussing. In the book there is the simplest BVH build. But I have done a SAH based build and it produces very good speedups without that much pain. But inserting it would start bloat. It's not clear what the project's philosophy should be on these. An appendix chapter perhaps as suggested above? Multithreading in particular is an extreme case because of the bang for buck. But also brings in libraries which is painful. All in all this seems like a good concrete example to try to use to expose a more explicit philosophy. Opinions? |
Beta Was this translation helpful? Give feedback.
-
I would suggest that with the org, projects such as these get their own GitHub repos (books + source). If someone has already done, say, Ray Tracing the Next Week, they shouldn't have to go back to some state to tackle a new series of chapters in the middle. Instead, add another volume to the library, working to maintain the same underlying philosophy of the series: featherweight publishing model, strongly encouraging readers to write their own code. I agree that pulling in external libraries is painful, particularly for cross-platform development. That in turn brings in build systems and all the fun with that. Still, I'd say “let a thousand flowers bloom”. One suggestion would be to provide two sets of source code: one the starting body of code (which might just be a reference to the final code for one of the books, but you'll have to watch for changes), and two, the final completed code. Like the other three books, the new book should walk the reader through modification of the existing codebase to the new multi-threaded result. |
Beta Was this translation helpful? Give feedback.
-
+1 to adding another volume to the library. Would each technique get it's own, 5-10 page volume, or do we collate another half-dozen? However, the OP is explicitly stating the value in placing multithreading among the techniques in The Next Week. |
Beta Was this translation helpful? Give feedback.
-
Note that neither OpenMP nor std::thread require additional libraries. For the former, the compiler needs to support it (many do: https://www.openmp.org/resources/openmp-compilers-tools/ including the Visual Studio C++ compiler, although it's not listed there), for the latter you need C++11. The example one has at the end of volume 1 (random scene with close to 500 objects; nx=1200, ny=800, ns=10; parameters taken from the src code in this repository) takes ~180s on my machine to render. I'm only seeing a 5x speedup for this particular instance using the change I described above, but this is still quite significant when you need to or want to experiment. (Depending on scene, platform, etc. I've seen a 5x up to 14x speedup, though I'm beginning to doubt the 14x number.) If I had had the idea to use OpenMP earlier, I would have added this as soon as there's the first loop over the rows of the image. This is chapter 3 in the first book. The advantage you get in code iteration speed is just too big to pass up. It's also worth noting that the changes are very local: Write into buffer instead of std::cout line; new method to write buffer to file; single OpenMP line above for loop over rows of the output image. |
Beta Was this translation helpful? Give feedback.
-
That makes this much more interesting. As this is an introductory text, I believe that understanding how a single-threaded approach would work is a prerequisite to parallelizing the code. However, your point about iteration is very well taken. Can we do both approaches? Given your suggestions above, it seems like we could introduce a If this is accurate, could you cobble up an example of how you think this would look? I've sent an invite so you can just work in a feature branch without the overhead of creating your own fork. Just code for now — updating the book content can come later. |
Beta Was this translation helpful? Give feedback.
-
There are now two examples how to add multithreading in this feature branch: https://github.com/RayTracing/InOneWeekend/tree/ronaldfw/multithreading
This is the elegant solution as it requires only one line to add the multithreading. Unfortunately setup of OpenMP on Mac OSX seems to be more involved and I didn't even get it to run. I was only able to test this on Windows (which requires more changes as drand48() is not available there). Not being able to use this on Mac easily rules this option out I think.
Requires to be compiled with "-std=c++11". This works well but the code changes are a little more involved than just adding a single line. One my iMac, I get a speedup of 3.33 (4 physical cores). |
Beta Was this translation helpful? Give feedback.
-
I'd prefer to stick with the standard. C++11 seems quite reasonable. |
Beta Was this translation helpful? Give feedback.
-
My current thinking on this is that we should add an appendix to the book that adds multithreading (if at all). I agree with Steve's point that this is an introductory text and multithreading, while quite simple to implement here, complicates things in a way that makes focusing on the core of what folks are aiming to learn here (ray tracing) more difficult. If we have this in an appendix, folks will still have the option to go there. We could have a paragraph in chapter 3 along the lines of "If you are interested in adding multithreading to your code, you can now take a detour through the appendix "Multithreaded Rendering". In general we recommend doing so as in particular it helps a lot with iteration speed when experimenting with code. But since this book is about learning ray tracing and not programming, we don't want to include this in the main part of the book. [...]" The appendix should then discuss how to render into a buffer and how to add multithreading. For the latter I'd recommend we stick to C++11's std::thread. While OpenMP is a lot more elegant, getting it to work on macOS is a pain. I think the more involved code changes using std::thread are okay if this is done in the appendix where we can have a more elaborate discussion. I've started drafting the book parts for the appendix but it's not yet ready to be shared. |
Beta Was this translation helpful? Give feedback.
-
I'm creating a new branch on the new repository to keep working on this and bring over all the changes I had made earlier on the InOneWeekend repository. Also, as has been commented elsewhere, this requires a thread-safe approach to random number generation. My first version will not get this right in all aspects but will use thread_local on the random number generators. The remaining issue is that they will all be seeded with the same state. |
Beta Was this translation helpful? Give feedback.
-
I'm pulling this out of the v3 milestone, as it's not essential to that release. It would be great to have it, but it shouldn't hold things up, and you should feel free to get this done in your own time. |
Beta Was this translation helpful? Give feedback.
-
I've written a chapter to walk the user through this. It needs more edits, but the bones are there |
Beta Was this translation helpful? Give feedback.
-
@trevordblack Might I suggest the use of |
Beta Was this translation helpful? Give feedback.
-
Hmmm. I've a chapter written that's about 1/2 the length of book 2 or book 3 (it's a chonker). The assumption with the chapter is that everyone is starting from zero, and that many people would be writing their code in a different language. At present I explain what a race condition is, where it comes from, and some simple means of dealing with them. All of these solutions boil down to making sure all threads read from constant data and write to independent memory locations. I don't think I'll have fulfilled the spec if I don't keep all that in (explaining multithreading w/o explaining race conditions would be a mistake). Spending an extra 500 (?) words to explain semiphores/futures/locks wasn't a thing I wanted to do. But I'll loop back around and consider it. Using a feature married pretty heavily to the Not that it's a bad idea. I just wonder if it's outside scope... |
Beta Was this translation helpful? Give feedback.
-
I have been doing live coding with the content here and this came up quickly from watchers, that is some of my followers expressed their desire to implement the content using different languages, most notably rust. They asked me if the code depends any external libraries, etc. I would say staying with your initial idea seems better suited and better aligned with the general philosophy of the work in general, and I would not be surprised if you consider the following out of scope. Maybe after the 'final' solution or near the final an alternative listing can be provided with async, saying its just a high-level wrapper of threads/futures etc. The only positive I can think of is the ability to return value from them, which I find slightly easier than ensuring different memory write locations explicitly. struct ImageTile;
struct ReturnTile;
std::vector<std::future<ImageTile>> fs(THREAD_NB);
for(int i=0; i < THREAD_NB; i++){
ImageTile imt = makeTile(...);
fs[i] = std::async(traceTile&, imt); // the inner loop with pixel samples
}
std::vector<std::vector<color>> img(imwidth, std::vector<color>(imheight, color(0,0,0)));
for (auto &f: fs){
ReturnTile ret = f.get()
joinTileToImv(ret, img);
}
writeImgToPpm(img); But again it's true that it would be out of scope, even slightly harmful, for introducing someone starting from zero to multithreaded rending. |
Beta Was this translation helpful? Give feedback.
-
Another quick point came up while I was debugging multithreaded version of the final scene of The Rest. I was debugging with gdb and debugging multithreaded version proved to be more difficult. std::vector<std::future<ImageTile>> fs(THREAD_NB);
fs[i] = std::async(traceTile&, imt); // the inner loop with pixel samples
ReturnTile ret = f.get()
joinTileToImv(ret, img); to std::vector<ImageTile> fs(THREAD_NB);
fs[i] = raceTile(imt);
joinTileToImv(f, img); In the above code, I was able to drop back to single threaded execution. Maybe it is not directly relevant to the main point, ie increasing performance, but I thought it might be a point to consider. |
Beta Was this translation helpful? Give feedback.
-
I was trying to make it easy for people to use OMP, without getting into pseudo RNG related issues when multiple threads are involved. |
Beta Was this translation helpful? Give feedback.
-
This is basically thread safe for our purposes:
You're right that we could call in all manner of really cool and functional libraries, even truly random ones! I just think it's removing a learning opportunity |
Beta Was this translation helpful? Give feedback.
-
The idea is to prevent different threads to repeat the same sequence; in the posted code there isn't any thread specific logic nor any protection, such as mutex, to protect internal state. Any two threads can execute the same sequence. |
Beta Was this translation helpful? Give feedback.
-
Wait, back up.
|
Beta Was this translation helpful? Give feedback.
-
It's look like it's been a while since the discussion went cold here, but I'd like to chime in and also to know what the current consensus is. The final scene of Book 2 was basically impossible slow for me, so I decided to give parallelism a shot. With OpenMP, you can achieve this with two lines of OpenMP annotations and not a single change to the code itself. But @ronaldfw has indicated that setting up OpenMP on Mac is troublesome, so I decided to try it with only standard library features (C++20), and (mostly) managed to do it without major changes to the code, but there is a catch. First, the changes. I parallelized the sample accumulation loop, since it was the easiest target and made for the least changes. The changes that I made also don't require you to think about indexes much, so I find them easier to understand. We are gonna need a vector to store the samples before accumulating them, and another for listing the indexes from [0, samplesPerPixel):
These can be created only once before the main loop. Also note that the second vector could possibly be replaced with As for the sample accumulation loop, it becomes:
Here we are using Finally, writing the color to the output becomes:
I don't know whether the random number generator is thread-safe, but the output seems correct to me, at least when rendering at 400x400 and a 1000 samples per pixel (for speed). Now comes the catch: at least on GCC 11.1.0 under Linux, actually getting the loop to use more than one thread requires linking against Intel's Thread Building Blocks, using In conclusion, as it stands right now, I find OpenMP to be the superior alternative (except for the Mac ecosystem), since AFAIK it ships by default with GCC (and maybe MSVC? I'm not sure). |
Beta Was this translation helpful? Give feedback.
-
If we ever do multithreading it'll be with c++11 std::thread, not OpenMP |
Beta Was this translation helpful? Give feedback.
-
@trevordblack The solution I posted is a good alternative if depending on OpenMP is problematic, but it does use C++ 20. However, the TBB dependency is probably just a quirk of GCC; I don't know how clang and MSVC implement this. Anyway, if sticking to C++11 is a requirement, here's a solution that builds on ronaldfw's branch, but makes smaller changes and has less indexing to deal with:
It still requires a vector of samples, but we can drop the vector of indexes from my last comment. Also, we need to create a vector of threads:
Under GCC and Linux I had to use |
Beta Was this translation helpful? Give feedback.
-
Couple of changes would be needed:
Because, while the syntax above is clean, it's not very intuitive. So I would motivate the distinct function and create it.
There's no real good way to remove the vector of samples. Once you have multiple threads writing to data you need a way to guarantee there isn't aliasing or race conditions. You an do this by allotting blocks of memory that is unique per thread (vector of samples) or allot a file of memory that is unique per thread The bottleneck for the change here isn't that we can't do it, or don't know how. It just hasn't been a priority, and figuring out a good place to put it within the context of the three books has been annoying. On the whole though, I'm glad that you got it working and are pushing for this again. |
Beta Was this translation helpful? Give feedback.
-
About your points:
Yes, I noticed that. It's either the vector or synchronization, which kinda defeats the purpose.
I also noticed that. I chimed in to try and give an option with the minimum possible changes, so that they are easier to explain to the reader. I was thinking the implementation that I proposed could be a small (optional) subsection before the final scene of Book 2, because that's where you really feel the need for parallelism. Anyway, I think this is now a problem of deciding how to write the book, so we should just let the contributors that had already been involved in this issue weigh in. But feel free to tag me when you need an extra pair of hands on this :) |
Beta Was this translation helpful? Give feedback.
-
"Anyway, I think this is now a problem of deciding how to write the book, so we should just let the contributors that had already been involved in this issue weigh in." That would be me |
Beta Was this translation helpful? Give feedback.
-
Just eager to know is this still alive, since the final scene of Book 2 is way too slow. |
Beta Was this translation helpful? Give feedback.
-
If this happens, it will be its own book or other work. We will not be doing this in the regular series. |
Beta Was this translation helpful? Give feedback.
-
@trevordblack — do you want to take what you already have so far and at least plop it down in a new repo/book? |
Beta Was this translation helpful? Give feedback.
-
So I have been trying this actually but I can't get any significant speedup by doing this. I made a copy of the latest version of this repo (v4.0.1) and worked on the "In One Weekend" section. I added the annotation I timed only how long it took to complete I reverted to commit 2e5cc2e (for no other reason than the fact that it was released on Dec 9, 2020, the same day kyamant's post was made). I modified the code to write to the 2D vector
This gave me a speedup of only 2.34x, but again, I'm using 8 cores and would expect something higher. From this thread it seems like users are able to achieve greater speedup using similar omp headers. I would greatly appreciate if anyone could point me to how that's being done. Thanks to everyone in this thread for engaging with this book and to the authors for a great resource. |
Beta Was this translation helpful? Give feedback.
-
Given that this issue was created five years ago now (before we had GitHub discussions), and the fact that this will always be an open area to play with and not really an issue to be "solved", I'm converting this issue to a discussion. |
Beta Was this translation helpful? Give feedback.
-
Rendering time in a single-threaded C++ implementation becomes an issue when getting to the larger examples and when trying to render with many samples per pixel, in particular, before doing the BVH implementation in book 2.
There is a fairly simple change to the code that adds multithreading over the rows of the output image that speeds up rendering times a lot. This allows for faster experimentation with different scenes and different code.
The change consists of two parts: (1) Render into a buffer instead of printing pixel values to cout. (2) Parallelize over y of the output image (obviously this is only possible after making the change in (1)). Using OpenMP, (2) can be done in a single line.
On my 6-core/12-hyperthread CPU, the total speedup of both changes is somewhere around 14x. Ignoring file output, I get about 5.6x speedup over the single-threaded version.
On the flip-side, the multithreaded version takes away some of the purity of the original code: writing into a buffer is more complicated than just printing every pixel value to cout; OpenMP support needs to be turned on, though I believe most C++ compilers support this (tested only g++ and Visual Studio). An alternative could be to use std::thread.
If you consider this a good addition, I'd be happy to:
Beta Was this translation helpful? Give feedback.
All reactions